AD-A198  429 


OUt  rll-E  ^ 


RADC-TR-88-17 
Final  Technical  Report 
February  1988 


NETWORK  MANAGEMENT  OF  HIGHLY 
ADAPTIVE  COMMUNICATION 
NETWORKS 


Southeastern  Center  for  Electrical  Engineering  Education 


Jeffery  L.  Kennington,  Richard  V.  Helgason  and  John  M.  Colombi,  2Lt,  USAF 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


0T\ 

-,£L.ECTE|\ 

j  BP 


This  effort  was  funded  totally  by  the  Laboratory  Directors’  fund. 


ROME  AIR  DEVELOPMENT  CENTER 
Air  Force  Systems  Command 
Griff iss  Air  Force  Base,  NY  13441-5700 


88  8 


19  4  ' 


-  ■.  *-  .  « .  *  .%  ■*  .v  -  *  m  ^  r>  r/ W/  v  /  -•v  a.  »*-  A- 


RADC-TR-88-17  has  been  reviewed  and  is  approved  for  publication. 


APPROVED:  'Til 


JOHN  M.  COLOMBI,  2Lt,  USAF 
Project  Engineer 


APPROVED: 


BRUNO  BEEK 
Technical  Director 
Directorate  of  Communications 


// 


FOR  THE  COMMANDER 


1DER :  £ 


JAMES  W.  HYDE,  III 
Directorate  of  Plans  &  Programs 


DESTRUCTION  NOTICE  -  For  classified  documents,  follow  the  procedures  in  DOD 
5200. 22-M,  Industrial  Security  Manual,  or  DOD  5200. 1-R,  Information  Security 
Program  Regulation.  For  unclassified  limited  documents,  destroy  by  any 
method  that  will  prevent  disclosure  of  contents  or  reconstruction  of  the 
document . 

If  your  address  has  changed  or  if  you  wish  to  be  removed  from  the  RADC  mailing 
list,  or  if  the  addressee  is  no  longer  employed  by  your  organization,  please 
notify  RADC  (DCLD)  Griff iss  AFB  NY  13441-5700.  This  will  assist  us  in 
maintaining  a  current  mailing  list. 


Do  not  return  copies  of  this  report  unless  contractual  obligations  or  notices 
on  a  specific  document  requires  that  it  be  returned. 


UNCLASSIFIED 

CURlTY  CLASS'FCATON  OP  ThiS  >0 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  NO  0704-0188 


1*  REPORT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 

2*  SECURITY  CLASSIFICATION  authority 


1b  RESTRICTIVE  MARKINGS 

N/A _ 

3  DISTRIBUTION /AVAILABILITY  OF  REPORT 


2b  DECLASSIFICATION /DOWNGRADING  SCHEDULE 


Approved  for  public  release; 
distribution  unlimited. 


4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 
N/A 


5.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


6a  NAME  OF  PERFORMING  ORGANIZATION 
Southeastern  Center  for 
Electrical  Engineering  Educatioi 

6c  ADDRESS  {City.  Suit,  and  ZIP  Code) 
Central  Florida  Facility 
1101  Massachusetts  Avenue 
St.  Cloud  FL  32769 


6b  OFFICE  SYMBOL 
(If  applicable) 


RADC-TR-88-17 

7a  NAME  OF  MONITORING  ORGANIZATION 


Rome  Air  Development  Center  (DCLD) 
7b  ADDRESS  (City,  State,  and  ZIP  Code) 


Griff iss  AFB  NY  13441-5700 


8a.  NAME  OF  FUNDING /SPONSORING 
ORGANIZATION 


Rome  Air  Development  Center 
8c  ADDRESS  (C/ty,  State,  and  ZIP  Code) 


80  OFFICE  SYMBOL 
(If  applicable) 

DCLD 


f 9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 


F30602-81-C-0193 

10  SOURCE  OF  FUNDING  NUMBERS 


Griffiss  AFB  NY  13441-5700 


PROGRAM 
ELEMENT  NO 


PROJECT 

NO 


I  WORK  UNIT 
ACCESSION  NO 


61101F 


1 1  TITLE  (Include  Security  Classification) 

NETWORK  MANAGEMENT  OF  HIGHLY  ADAPTIVE  COMMUNICATION  NETWORKS 


12  PERSONAL  AUTHOR(S) 

Jeffrey  L.  Kennington,  Richard  V.  Helgason,  John  M.  Colombi,  2Lt,  USAF 

13a  TYPE  OF  REPORT  3b  TIME  COVERED  |l«  DATE  OF  REPORT  (Year,  Month.  Day)  |15  PAGE  COUNT 

Final  from  Aug  86  to  Jul  87  February  1988  _ 128 

16  SUPPLEMENTARY  NOTATION 


iThis  effort  was  funded  totally  by  the  Laboratory  Directors'  fund. 


17  COSATi  CODES  18  SUBJECT  TERMS  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

f|EiP _ GROUP _ Su8-grqup  Operations  Research  Communications  Networks 

_ 1_7 _ 02 _ Parallel  Processing  Network  Flow  Problems 


1 9  ABSTRACT  ( Continue  on  reverie  if  necessary  and  identify  by  block  number) 

This  report  documents  networking  models,  network  solutions,  programing  techniques  for 
parallel  processing,  and  parallelized  algorithm  comparisons.  Several  papers  are  contained 
in  the  report.  An  operational  research  model  and  associated  mathematics  are  presented  for  a 
three  node  network.  A  multi-media  nodal  simulation  is  developed  to  optimally  assign  trunks. 
A  new  mathematical  approach  is  shown  for  solving  equal  flow  problems.  This  technique  makes 
greater  use  of  the  side  constraints  structure  with  computational  solutions  given.  Also 
developed  are  the  mathematical  theory  and  justification  of  using  the  quadrant  interlocking 
factorization  for  solving  the  simplex  algorithm  on  a  parallel  processor.  Lastly,  computa¬ 
tional  results  of  solving  minimal  spanning  tree  problems,  on  a  parallel  processor  are 
presented. 


20  DISTRIBUTION /AVAILABILITY  OF  ABSTRACT 
BuNCLASSIFIEDAJNLIMITED  □  SAME  AS  RPT 
22*  NAME  OF  RESPONSIBLE  INDIVIDUAL 
John  M.  Colombi,  2Lt,  USAF _ 


21  ABSTRACT  SECURITY  CLASSIFICATION 
□  otic  USERS  UNCLASSIFIED 

22b  TELEPHONE  (Include  Area  Code)  22c  OFFICE  SYMBOL 

(315)  330-7751  RADC  (DCLD) 


00  Form  1473,  JUN  86 


Previous  editions  are  obsolete 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 
UNCLASSIFIED 


mmm 


VIV1 


by  the  Rome  Air 
er  F30603-81-0193. 
Laboratory  Director's 


I  INTRODUCTION 

This  report  details  the  work,  accomplished  under  the  US 
Air  Force,  Rome  Air  Development  Center  Contract  No. 
F30602-8 1- C- 0193  .  The  twelve  month  Post  Doctoral  effort  used 
as  a  foundation  the  education  of  the  Department  of  Operations 
Research,  School  of  Engineering  and  Applied  Sciences  at 
Southern  Methodist  University,  Dallas,  Texas.  The  principal 
investigator  was  Dr.  Jeffery  L.  Kennington. 

The  objective  was  to  take  optimization  theory  developed 
by  the  Air  Force  Office  of  Scientific  Research  for  solving 
multi  commodity  problems,  and  apply  it  to  the  topic  of 
communication  network  management.  Under  Project  Forecast 

II  ( PFI I ) ,  a  concept  for  a  highly  adaptive  communication 
network  was  developed.  This  complex  and  survlvaole  network 
will  require  new  techniques  for  managing  itB  available 
resources.  These  results  will  further  develop  classes  of 
algorithms  that  can  manage  this  network.  The  results 
contained  are  a  compilation  of  five  separate  reports, 
contained  in  Appendices  A  -  E. 


2  SCOPE 


The  structure  of  this  report  oot  only  reflects  tbe  work, 
done,  but  tbe  evolution  of  this  work,  as  well.  Initially, 
efforts  concentrated  on  modelling  communication  networks  and 
later  progressed  into  algorithm  development  for  high 
performance  computer  architectures. 

Appendix  A  summarizes  an  initial  OR  model  for  a  three 
node  network.  The  solution  makes  use  of  a  commodity  based 
supply  and  demand  model. 

Appendix  3  is  a  new  technique  for  handling  network  flow 
solutions.  In  particular,  the  method  is  a  means  for  solving 
linear  equal  flow  problems,  by  more  efficiency  utilizing  tbe 
special  structure  of  the  side  constraints.  Computational 
solution  time  results  are  given. 

Appendix  C  summarizes  the  development  of  an  optimization 
model  for  a  single  communications  node  having  multiple  media. 
This  code  provides  an  initial  capability  for  examining  node 
managment  and  its  functions.  Incorporated  into  the  model 
are  capacity  and  delay  constraints. 

Appendix  D  is  a  Ph.D.  dissertation  of  Dr,  Hossam  Zaki 
who  was  supported,  on  this  effort.  This  report  presents  the 
quadrant  interlocking  factorization  for  solving  the  simplex 
algorithm  for  linear  problems  on  parallel  machines.  Included 
in  the  report  are  the  relevant  justifications  and  algorithms 
required  for  Implementation. 

Appendix  E  contains  computational  results  of  work  done 
on  a  twenty-CPU  Balance  21000  computer.  Algorithms  for 
solving  minimal  spanning  trees  were  investigated. 
Comparisons  of  sequential  and  parallel  solutions  for  these 
algorithms  are  presented. 


3  CONCLUSIONS 

As  presented  in  many  of  the  references,  much  work  is 
being  accomplished  in  OR  techniques  for  a  variety  of 
applications.  Many  more  communications  network  management 
functions  and  networking  problems  are  being  near  optimally 
handl ed . 


The  next  major  efficiency  increase  appears  Co  be  found 
in  more  powerful  computer  architectures  with  new 
"sup ercomputing"  algorithms.  One  specific  application  which 
needs  development  is  communication  algorithms  for  parallel 
processing  machines,  and  the  last  two  reports  are  exemplary 
work  in  this  area.  Initial  summaries  reflect  that  a  possible 
sub-optimal  algorithm  may  prove  more  efficient  for  use  on  a 
multiple  CPU  architecture. 


Under  the  Project  Forecast  II  Program,  classes  of 
network  algorithms  are  needed  to  manage  the  PF  II  type  of 
network.  This  network  may  incorporate  parallel  processors  to 
perform  a  variety  of  user  and  resource  managment  functions. 
This  work  and  further  developments  will  feed  into  the  design 
and  implementation  of  the  specific  algorithms  to  be 
incorporated  in  the  Project  Forecast  II  627Q2F  Program. 
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This  report  gives  an  example  of  an  adaptive  communication  network  and  an 
operations  research  model  which  could  be  used  to  solve  it.  Consider  the 
3  node  communication  network  given  in  Figure  1. 

Suppose  lines  1,3, 5, 6,7  are  telephone  lines  and  2  and  4  are  micro  wave  links. 
Suppose  the  message  table  is: 


\To 

From 

1 

2 

3 

Totals 

1 

- 

5 

2 

7 

2 

1 

- 

0 

1 

3 

4 

7 

- 

11 

Totals 

5 

12 

2 

That  is,  5  messages  must  be  sent  from  node  1  to  node  2  during  the  next  time 
period. 

We  model  this  using  3  commoditites  (one  for  each  origin  node).  Then  the  supply 
and  demands  are  as  follows: 


rVV 


Commodity  1  2 

Supply 

Node  1  70 

Node  2  0  1 

Node  3  00 

Demand 

Node  1  01 

Node  2  50 

Node  3  20 

Requirement  (Supply  -  Demand) 
Node  1  '7  -1 

Node  2-51 
Node  3-20 


Suppose  that  node  1  can  communicate  with  node  2  using  either  telephone  lines  or 
micro  wave  links,  but  not  both  and  that  node  1  can  communicate  with  node  3 
using  either  telephone  lines  or  micro  wave  links,  but  not  both.  Then  the 
management  decisions  are  to  determine  which  edges  to  use  and  how  many  messages 
should  be  assigned  to  each  link.  Note  that  messages  going  from  1  to  2  can  go 
from  1  to  3  and  then  from  3  to  2 


We  now  define  a  mathematical  model  corresponding  to  this  example. 
Subscripts 

k  -  denotes  the  commodity  (origin  node), 
i  -  denotes  the  node, 
j  -  denotes  the  edge. 

Constants 


cK  -  denotes  the  cost  of  sending  one  message  originating  at  k  along  j. 

Uj  -  denotes  the  capacity  of  edge  j. 

A  -  denotes  the  node-arc  incidence  matrix  corresponding  to  the  graph. 

M  -  denotes  the  sum  of  all  supplies  for  all  commodities. 
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For  this  example: 


(edges) 


A 


1 

-1 

0 


1  1 
-1  0 
0  -1 


1  0 

0  1 

-1  -1 


0  -1 

-1  0 

1  1 


(node  1) 
(node  2) 
(node  3). 


M  -  19. 

rk  -  denotes  the  requirement  at  node  i  of  commodity  k. 
i 

rk  -  denotes  the  vector  of  requirements  for  commodity  k 


For  this  example: 


r  1  «= 

7 

-5 

-2 

r  2  =  | 

9  ^ 

-1 

1 

0 

! 

r  3  = 

-4 

-7 

11 

_  ____ 

_  _ 

1 . 

— 

f j  -  denotes  the  fixed  cost  for  using  edge  j. 
n  -  denotes  the  number  of  edges, 
m  -  denotes  the  number  of  nodes. 


Decision  Variables 

yj  *  0,  if  edge  j  is  not  used  and  1  otherwise. 

-  denotes  the  flow  of  commodity  k  on  edge  j. 
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minimize 


(1) 


I  I  ck  xk  +  I  f  y. 
k=l  j=l  j  j  j=l  j  ^ 


A  k  k 

Ax  =  r 

,  k 

E  1 1 . . .  m 

(2) 

n 

k 

I 

k=l 

x .  < 

J  ~ 

U3  *  j 

=  1 , . . .  ,n 

(3) 

n 

k 

My.j,  j 

r 

k-1 

x .  < 

J  - 

*  1, . . . ,n 

(4) 

>1 

<  l 

(5) 

'3 

+  '4 

<_  l 

(6) 

k 
x  . 

J 

_>  0,  all  j  ,k 

(7) 

y3 

e  {0, 

1),  all 

j  • 

(8) 

Constraints  (2)  ensure  that  the  messages  begin  and  terminate  at  the  correct 
nodes.  These  are  sometimes  called  flow  conservation  constraints.  Con¬ 
straints  (3)  ensure  that  the  capacity  (bandwidth)  of  the  edges  is  not  exceeded. 
Constraints  (4)  ensure  that  yj  ■  1  if  any  messages  are  assigned  to  edge  j. 
Constraint  (5)  ensures  that  at  most  one  of  the  edges  1  and  2  are  used. 
Constraints  (7)  ensure  that  messages  travel  in  the  proper  direction. 
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PREFACE 

This  paper  presents  a  new  algorithm  for  the  solution  of  a  network  problem  with  equal  flow  side 
constraints.  The  solution  technique  is  motivated  by  the  desire  to  exploit  the  special  structure  of  the 
side  constraints  and  to  maintain  as  much  of  the  characteristics  of  pure  network  problems  as  possi¬ 
ble.  The  proposed  algorithm  solves  the  equal  flow  problem  using  two  sequences  of  pure  network 
problems.  One  sequence  corresponds  to  computing  a  lower  bound  while  the  other  corresponds  to 
computing  an  upper  bound.  Step  sizes  exist  such  that  both  bounds  converge  to  the  optima]  ob¬ 
jective  value.  Termination  when  the  difference  between  the  bounds  is  within  a  prespecified  toler¬ 
ance  is  a  particularly  attractive  feature  of  the  solution  procedure  employed.  The  algorithm  has  been 
tested  on  problems  with  up  to  1500  nodes  and  6000  arcs.  Computational  experience  indicates  that 
feasible  solutions  whose  objective  function  value  is  within  10%  of  the  optimum  can  be  obtained 
in  6%  of  the  time  required  for  MPSX  to  obtain  an  optimum.  Guaranteed  5%  solutions  can  be 
obtained  in  17%  of  the  MPSX  time 
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I.  INTRODUCTION 


This  paper  presents  a  new  technique  to  solve  the  linear  equal  flow  problem.  The  problem 
is  easily  conceptualized  as  a  minimal  cost  network  flow  problem  with  additional  constraints  on 
certain  pairs  of  arcs.  Specifically,  given  pairs  of  arcs  are  required  to  take  on  the  same  value.  The 
problem  is  defined  on  a  network  represented  by  an  m  x  n  node-arc  incidence  matrix,  A,  in  which 
K  pairs  of  arcs  are  identified  and  required  to  have  equal  flow.  Mathematically,  this  is  expressed  as: 

Minimize  cx 
s.t.  Ax  e  b 

xk  =  ^ + k  •  k  =  1,2,... ,K 
0  £  x  £  u 

where  c  is  a  1  x  n  vector  of  unit  costs,  b  is  an  m  x  1  vector  of  node  requirements,  0  is  an  n  x  1  vector 
of  zeroes,  x  is  an  n  x  1  vector  of  decision  variables,  and  u  is  an  n  x  1  vector  of  upper  bounds.  This 
mathematical  statement  of  the  problem,  henceforth  referred  to  as  problem  PI,  assumes  that  the  first 
2K  arcs  appear  in  the  equal  flow  constraints.  This  is  not  a  restrictive  assumption,  since  by  rear¬ 
ranging  the  order  of  the  arcs,  any  equal  flow  problem  with  K  pairs  can  be  expressed  in  the  above 
form.  Note  that  the  K  pairs  of  arcs  are  mutually  exclusive,  i.  e.,  an  arc  appears  in  at  most  one  side 
constraint.  We  also  assume  without  loss  of  generality,  that  u,  =  uk  +  K  for  k  =  1,2,...,K. 

When  the  flow  in  arcs  must  be  integral,  the  problem  is  referred  to  as  an  integer  equal  flow 
problem.  Applications  of  the  integer  model  include  crew  scheduling  [5],  estimating  driver  costs 
for  transit  operations  [  14],  and  the  two  duty  period  scheduling  problem  [11].  The  linear  equal 
flow  problem  is  a  natural  relaxation  for  the  integer  problem  and  also  provides  an  approximation 
to  the  integer  model.  The  linear  model  is  applicable  to  problems  where  integrality  is  not  restrictive. 
For  example,  in  federal  matching  of  funds  allocated  to  various  projects  [4]. 
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The  linear  equal  flow  problem  may  be  solved  using  a  specialization  of  the  simplex  method 
for  networks  with  side  constaints  [3].  It  has  also  been  solved  by  transformation  to  a  nonlinear 
programming  problem  [4].  By  exploiting  the  special  structure  of  the  side  constraints  and  the 
network  structure,  this  paper  develops  a  new  algorithm  which  results  in  a  decrease  in  both  computer 
storage  and  computation  time  The  procedure  employs  relaxation  and  decomposition  and  solves 
the  equal  flow  problem  using  two  sequences  of  pure  network  problems,  totally  eliminating  the 
computational  overhead  associated  with  maintaining  a  basis  matrix.  The  computational  efficiencies 
in  specialized  software  for  the  solution  of  pure  networks  also  extend  to  the  solution  of  sequences 
of  minimum  cost  network  flow  problems  by  using  reoptimization  procedures.  The  reoptimization 
procedures  are  used  in  solving  the  subproblems  of  the  two  sequences. 

The  use  of  relaxation  techniques  and/or  decomposition  techniques  in  the  solution  of  prob¬ 
lems  with  special  structure  in  the  constraint  set  is  motivated  by  potential  computational  efficiencies. 
Glover,  Glover  and  Martinson  [6]  address  a  generalized  network  problem  in  which  arcs  in  speci¬ 
fied  subsets  must  have  proportional  flow.  The  solution  approach  is  via  solution  of  a  series  of 
problem  relaxations  and  progressive  bound  adjustment.  The  underlying  principle  is  shared  in  the 
ensuing  development  for  the  equal  flow  problem. 

Lagrangean  relaxation  has  been  used  to  aid  in  the  solution  of  the  integer  equal  flow  problem 

in  two  specific  instances.  Shepardson  and  Marsten  [11]  reformulate  the  two  duty  period  sched¬ 
uling  problem  as  a  single  duty  period  scheduling  problem  with  equal  flow  side  constraints  and 
integrality  constraints  on  the  variables.  Tumquist  and  Malandraki  [  14]  model  the  problem  of 
estimating  driver  costs  for  transit  operations  as  an  integer  equal  flow  problem.  In  both  studies,  the 
side  constraints  are  dualized  and  the  Lagrangean  dual  solved  using  subgradient  optimization  to  yield 
a  lower  bound  on  the  optimal  objective  value.  In  [  14]  step-size  determination  during  the  sub¬ 
gradient  optimization  process  is  aided  by  a  line  search.  The  Lagrangean  relaxation,  of  course,  does 
not  enforce  the  equal  flow  constraints.  The  Lagrangean  dual  for  the  linear  or  the  integer  equal  flow 
problem  is  exactly  the  same,  since  the  constraint  set  for  the  Lagrangean  relaxation  is  identical.  This 
Lagrangean  dual  is  similar  to  the  quadratic  programming  problem  used  in  [4],  The  similarity  lies 


in  penalizing  the  violating  equal  flow  constraints.  The  penalty  chosen  in  the  quadratic  problem 
formulation  should  be  sufficiently  large  to  guarantee  convergence  to  the  optimal  solution. 

The  objective  of  this  investigation  is  to  develop  and  computationally  test  a  new  algorithm  for 
the  linear  equal  flow  problem  The  solution  technique  consists  of  solving  two  sequences  of  pure 
network  problems.  One  sequence  progressively  yields  tighter  lower  bounds  on  the  optimal  value 
by  using  the  Lagrangcan  relaxation  of  the  equal  flow  problem  with  the  side  constraints  dualized. 
The  second  sequence  progressively  yields  upper  bounds  on  the  optimal  value  for  the  problem  and 
maintains  a  feasible  solution  at  all  times.  This  sequence  is  obtained  by  use  of  a  decomposition  of 
the  equal  flow  problem  based  on  parametric  changes  in  the  requirements  vector.  The  solution 
procedure  has  the  added  attractive  feature  that  it  provides  a  feasible  solution  which  is  known  to  be 
within  a  percentage  of  the  optimal  at  all  times.  As  such,  the  algorithm  terminates  when  a  solution 
with  a  prespecified  tolerance  on  the  objective  function  value  is  obtained. 

The  solution  technique  makes  use  of  subgradient  optimization  in  the  solution  of  the  lower 
and  the  upper  bounding  problems.  Both  the  lower  and  upper  bounding  algorithms  have  been  de¬ 
veloped  in  the  context  of  the  general  subgradient  algorithm  which  is  briefly  presented  in  Section  II. 
Section  III  introduces  the  Lagrangean  dual  for  the  equal  flow  problem  and  the  lower  bounding 
algorithm.  Section  IV  presents  the  decomposition  of  the  equal  flow  problem  and  the  upper 
bounding  algorithm.  The  overall  procedure  which  makes  use  of  the  algorithms  of  Sections  III  and 
IV  is  given  in  Section  V,  computational  results  are  given  in  Section  VI  and  conclusions  drawn  in 
Section  VII. 


III.  THE  LOWER  BOUND 


fj 


A  lower  bound  on  the  objective  function  of  the  linear  equal  flow  problem,  PI,  can  be  ob¬ 
tained  by  using  the  Lagrangean  dual  of  the  problem.  The  lower  bound  is  used  in  the  step  size  de¬ 
termination  as  well  as  in  termination  criteria  for  the  upper  bound  procedure.  Associating  the 
Lagrange  multiplier  wk  with  the  kth  equal  flow  constraint  and  defining  the  K-vector  w  =  (w|t  w2, 
.  .  .,  wK),  the  Lagrangean  dual  for  PI,  referred  to  as  problem  Dl,  may  be  stated  as 

maximize  h(w) 
w  €  RK 

where  h(»v)  =  min{cx  +  Xk  w^x^x**,,)  |  Ax  *  b,0  i  x  S  u).  Since  PI  is  a  linear  program, 
it  is  easily  established  that  the  optimal  objective  values  of  PI  and  Dl  are  equ»J  and  that  any  feasible 
solution  to  Dl  provides  a  lower  bound  on  the  optimal  objective  value  for  PI.  For  any  given  value 
of  the  vector  w,  the  Lagrangean  relaxation  is  a  pure  network  problem.  The  subgradient  of  h  at  a 
point  w  is  given  by  the  K-vector 
(  x,  -  XK4.| . xK  -  x2K) 

where  x  solves  the  Lagrangean  relaxation  at  w,  given  by 


{min  cx  +  Xk  wk(xk-xK.,k)  |Ax  =  b,0  i  x  S  u). 


ALGORITHM  1  assumed  the  function  f(y)  to  be  convex,  whereas  h(w)  is  piece-wise  linear 
concave.  The  lower  bounding  algorithm,  ALGORITHM  2,  modifies  the  framework  of  the  previ¬ 
ous  algorithm  for  a  concave  function.  The  step  sizes  used  are  given  by  =  p,  and  A,  =  A.,.,/2. 
The  algorithm  makes  use  of  a  scalar,  UBND,  representing  an  upper  bound  for  the  problem.  Since 
the  solution  procedure  progressively  improves  both  the  lower  bound  and  the  upper  bound  for  the 
equal  flow  problem,  each  time  the  lower  bound  algorithm  is  invoked  the  value  for  UBND  is  ob¬ 
tained  from  the  upper  bound  procedure.  For  this  algorithm,  we  assume  that  both  bounds  are 
greater  than  zero 


II.  THE  SUBGRADIENT  ALGORITHM 


The  subgradient  algorithm  was  first  introduced  by  Shor  [  13]  and  provides  a  framework  for 
solving  nonlinear  programming  problems.  It  may  be  viewed  as  a  generalization  of  the  steepest  de¬ 
scent  (ascent)  method  for  convex  (concave)  problems  in  which  the  gradient  may  not  exist  at  all 
points.  At  points  at  which  the  gradient  does  not  exist,  the  direction  of  movement  is  given  by  a 
subgradient.  Subgradients  do  not  necessarily  provide  improving  directions  and  consequently,  the 
convergence  results  of  Zangwill  C  15]  do  not  apply.  Convergence  of  the  subgradient  algorithm  is 
assured,  however,  under  fairly  minor  conditions  on  the  step  size. 

Given  the  nonlinear  program  PO, 

Minimize  f(y) 

s.t  y  6  G 

where  f  is  a  real-valued  function  that  is  convex  over  the  compact,  convex,  and  nonempty  set  G,  a 

vector  i]  is  a  subgradient  of  f  at  y'  if  f(y)  -  %*)  >  q(y  -  y')  for  all  y  e  G.  For  any  given  y'  G  G,  the 
set  of  all  subgradients  of  f  at  y'  is  denoted  by  df(y')  Moving  a  sufficiently  large  distance  s  along 
-h  can  yield  a  point  x  -  y'  -  st|  such  that  x  $  G.  The  projection  of  the  point  x  onto  G,  denoted 

by  P  [  x  ] ,  is  defined  to  be  the  unique  point  y  e  G  that  is  nearest  to  x  with  respect  to  the  Euclidean 
norm.  Using  the  projection  operation,  the  subgradient  algorithm  in  its  most  general  form  follows: 


ALGORITHM  1:  SUBGRADIENT  OPTMIZATION  ALGORITHM 


0  Initialization 

Let  y°  e  G, 

Select  a  set  of  step  sizes  s0,  s,, 

i  -  0. 

/  Find  Subgradient 
Let  ti,  e  df(y‘). 

If  Tj|  —  0,  then  terminate  with  y'  optimal. 

2  Move  to  new  point 

ym  4_  P[y.  .  s.t],] 

i  «-  i  +  I ,  and  return  to  step  I . 

There  are  three  general  schema  which  can  be  used  in  determining  the  step  size  when  the 
subgradient  algorithm  is  implemented  for  a  specific  problem: 

i.  s(  =  X, 

ii.  5j  =  X,/||t,,IP 

iii.  Si  =  Xs(f(yl)  -  F)/|tU> 

where  F  is  an  estimate  of  f*,  the  optimal  value  of  f  over  G.  A  summary  of  the  known  convergence 
results  for  this  algorithm  may  be  found  in  [2]  and  [103. 


s  \  \ 


ALGORITHM  2:  LOWER  BOUND  ALGORITHM 


/  Initialization 

Initialize  UBN'D,  step  size  p,  and  tolerance  t. 

W  0. 

2  Find  Subgradient 

Let  x  solve  h(w)  -  min(cx  +  |  Ax  -  b,  0  £  x  £u). 

LBND  «-  h(w). 

If  (UBND  -  LBND)  £  t(UBND),  terminate;  otherwise, 
d  <-  (  x,  -  xK*,,  .  .  .,xK  -  xJK). 

3  Move  to  new  point 

(a)  w  *-  w  +  pd,  p  *-  p/2. 

(b)  Go  to  step  2  . 
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IV.  THE  UPPER  BOUND 


An  alternate  formulation  of  problem  PI,  referred  to  as  P2,  obtained  by  decomposing  the 


problem  is  given  by 


Minimize  g(y) 
s.t.  y  e  S 


where  for  any  vector  y  =  (y, ,  y2 . yK), 

g(y)  =  {min  cx  |Ax  =  b;  0  £  x  £  u;  x*  =  xK,,  =  y„,  k  =  1,2,..,K), 


S  =  {y|0Syis  u„  for  k  =  1,2 . K). 


The  decomposition  assures  the  satisfaction  of  the  equal  flow  constraints.  The  decomposed  problem 
P2  is  equivalent  to  the  problem  PI  [12]  and  may  be  solved  using  a  specialization  of  the  subgra¬ 
dient  optimization  algorithm.  The  objective  function  is  piece-wise  linear  convex  and  the  subgra¬ 
dient  q  of  g  at  a  point  y  is  obtained  from  the  dual  variables,  v(,  i  =  1,2 . 2K,  associated  with  the 

equal  flow  constraints  in  the  subproblem,  referred  to  as  P3  and  given  by, 


Minimize 


=  yi 


=  y. 


(VK  + 1) 


=  y* 


0  £  x  £  u 


■w 


The  K-vector 


T1  =  (Vi^-V-i.Vj+V'k., . vK  +  v3K) 

is  a  subgradicni  of  g  at  y  =  (y , ,y2 . >’«)• 

The  dual  variables  vk,  k  =  1,2 . 2K  may  easily  be  constructed  from  the  solution  to  the  pure 

network  problem,  referred  to  as  problem  P4; 

{min  cx  |  Ax  =  b,  y  i  x  s  0  ), 

where  the  lower  and  upper  bound  n-vectors  y  and  0  are  defined  by 

yk  =  6k  =  y,,  k  =  1,2 . K 

Yk*«  ~  ®K-*k=  y»‘  ^  —  1.2,  ,K 

yk  =  0,  0k  =  uk,  k  =  2K  +  l,...,n. 

Let  n  be  the  vector  of  optimal  dual  variables  associated  with  the  conservation  of  flow  constraints, 
Ax  -  b  in  P4  and  the  arc  associated  with  the  variable  x,  be  incident  from  node  j,  and  incident  to 
node  j,.  The  optimal  dual  variables  for  P3  are  given  by, 
vk  =  -n^  +  nk(  +  ck,  k  =  1,2,. ..,2K. 

In  using  the  subgradient  optimization  algorithm  for  the  decomposed  problem  at  each  point  y,  the 
subgradient  ri  can  be  calculated  directly  using  the  above  development. 

It  is  possible  that  moving  a  step  along  the  negative  subgradient  yields  a  point  which  docs  not 
belong  to  the  set  S.  As  pointed  out  in  Section  II,  this  point  is  projected  onto  the  set  S  by  means 
of  a  projection  operation  in  the  algorithm.  For  this  model,  the  projection  operation  decomposes 
on  k  so  that  P[y]  =  (P[y,],  P[y,],  .  .  .  ,  PCyK3)  where  the  projections  PCy,]  are  defined  by: 
If y*  <  0.  P[yk]  =  0. 

If  y„  >  uk,  PCyJ  =  uk. 

IfO  s  y»  s  Uk,  PCy,]  =  yk. 
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The  subgradient  optimization  algorithm  for  problem  P2  makes  use  of  a  lower  bound,  LBND,  on 
the  optimal  objective  value  which  is  used  in  step  size  determination  using  a  variant  of  scheme  (iii) 
given  in  Secti  on  II,  as  well  as  in  the  termination  criterion  each  time  the  procedure  is  invoked. 
Again,  we  as*  rnt  that  both  bounds  are  greater  than  zero. 


ALGORITHM  3:  UPPER  BOUND  ALGORITHM 


/  Initialization 


Select  v  e  S. 


Initialize  LBND  and  c. 


2  Find  subgradient  and  step  size 


Let  x  and  fl  be  the  vectors  of  optimal  primal  and  dual  variables 


for  Min  {ex  j  Ax  =  b,  y  £  x  £  6  }. 


UBND  -  ci. 


If  (UBND  •  LBND)  s  e(LBND),  terminate  with  x  optimal; 


otherwise, 


V,  -  -nkf  +  nk(  +  c„  k  =  1,2,. ..,2K. 


h  (V|  +  vK * ,,  .  .  .,  vK  -+  vJK). 


3  Move  to  new  point 


(a)  y  —  PCy  -((UBND  -  LBND)/(2||t) |P))i) ] . 


(b)  Go  to  2. 
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V.  THE  ALGORITHM 


The  solution  of  the  equal  flow  problem  using  decomposition,  as  given  in  the  previous  section 
can  be  implemented  without  the  lower  bound  procedure.  It  is  also  possible  to  implement  the  lower 
bound  algorithm  independently  for  the  purpose  of  obtaining  a  lower  bound  on  the  optimal  value 
of  the  equal  flow  problem.  For  the  upper  bound  problem,  some  measure  of  the  lower  bound  on 
the  problem  must  be  used  to  aid  in  termination.  By  merging  the  two  procedures,  an  algorithm 
which  adjusts  the  lower  and  upper  bounds  progressively  can  be  used  to  advantage.  Not  only  can 
such  a  procedure  be  used  for  obtaining  feasible  solutions  with  relative  case,  but  it  can  also  provide 
a  measure  of  how  close  this  solution  is  to  the  optimal. 

The  algorithm  for  the  solution  of  the  equal  flow  problem  iterates  between  the  lower  bound 
procedure  and  the  upper  bound  procedure.  The  lower  and  upper  bounds,  LBND  and  UBND, 
progressively  become  tighter,  closing  in  on  the  optimal  solution  to  the  problem.  Each  time  the 
lower  bound  procedure  is  invoked,  a  maximum  of  ITERL  iterations  are  performed.  Each  time  the 
upper  bound  procedure  is  invoked,  a  maximum  of  ITERU  iterations  are  performed.  The  tuning 
parameters  for  the  algorithm  are  as  follows:  ITERL,  ITERU,  p  (the  initial  step  size),  and  e  (the 
termination  criterion.) 


ALGORITHM  4:  RELAXATION/DECOMPOSITION  ALGORITHM 
FOR  THE  EQUAL  FLOW  PROBLEM 


.S3 


1 


1 

m 

3 


0  Initialization 

Initialize  ITERL,  ITERU,  p,  t. 

T  «-  0,  R  «-  0,  w  «-  0,  UBND  «-  oo,  LBND  -  -oo. 

1  Compute  Lower  Bounds 

(a)  Call  ALGORITHM  2  (Steps  2  and  3  (a». 

(b)  T  -  T+l 

If  T  <  ITERL,  then  go  to  stop  1  (a). 

2  Compute  L  mer  Bounds 

(a)  Call  ALGORITHM  3  (Steps  2  and  3  (a)). 

(b)  R  «-  R+  1 

If  R  <  ITERU,  then  go  to  step  2  (a). 

3  Reset  iteration  counts 

T  <-  0,  R  —  0,  and  go  to  step  1 . 

The  initial  equal  flow  allocation  in  the  upper  bound  procedure  makes  use  of  the  solution  x  to  the 
last  pure  network  flow  problem  solved  in  the  lower  bound  procedure.  The  allocation  for  each  of 
the  K  pairs  of  equal  flow  constraints  is  determined  by: 

y„  «-  min[  u„  (xk+xK.k)/2  J  k  =  1,2 . K. 

All  subsequent  entries  into  Step  2  of  the  upper  bound  procedure  use  the  most  recent  equal  flow 
allocation  in  the  previous  upper  bound  iteration. 
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VI.  COMPUTATIONAL  RESULTS 


The  computer  implementation  of  the  algorithm  is  written  in  standard  FORTRAN  and  not 
tailored  to  either  the  machine  or  FORTRAN  compiler  used  for  testing.  The  implementation,  called 

EQFLO,  makes  use  of  MODFLO  [  1  ]  to  solve  pure  network  subproblems.  MODFLO  is  a  set 
of  subroutines  which  may  be  used  to  solve  a  network  problem  as  well  as  reoptimize  after  problem 

data  changes.  Based  on  NETFLO  [8],  this  code  allows  the  user  to  change  costs,  bounds  and/or 
requirements  for  a  network  problem  and  reoptimize. 

The  algorithm  has  been  tested  on  a  set  of  10  test  problems  randomly  generated  using 
NETGEN  [9],  a  large-scale  network  problem  generator.  The  parameters  used  to  generate  test 
problems  are  described  in  Klingman,  Napier,  and  Siutz  [9].  The  test  problems  have  between  200 
and  1500  nodes,  and  between  1500  and  6600  arcs.  For  each  problem,  the  first  2K  arcs  were  paired 
to  form  K  equal  flow  side  constraints.  In  order  to  gauge  the  performance  of  the  algorithm  for 
various  values  of  K,  some  of  the  problems  were  generated  using  the  same  base  network  problem 
data  with  K  varying  from  75  to  200.  The  benchmark  NETGEN  problems  have  a  specified  per¬ 
centage  of  arcs  which  are  uncapacitated  For  these  arcs,  the  capacity  was  defined  to  be  the  maxi¬ 
mum  of  all  supplies  and  demands. 

Computational  testing  was  carried  out  on  the  IBM  308 ID  at  Southern  Methodist  University 
using  the  FORTVS  compiler  with  OPT  =  2.  In  order  to  assess  the  computational  gains  afforded 
by  the  decomposition/relaxation  algorithm  for  the  equal  flow  problem,  each  problem  was  solved 
using  MPSX  C  7  ] .  An  additional  point  of  interest  which  was  addressed  is  the  choice  of  model  for 
the  equal  flow  problem  when  using  a  general  linear  programming  system  such  as  MPSX. 

The  equal  flow  problem  ran  also  be  formulated  as  a  network  with  side  columns.  For  an  equal 
flow  problem  defined  on  a  network  with  m  nodes,  n  arcs  and  K  equal  flow  pairs,  the  side  constraint 
formulation  uses  m  +  K  constraints  and  n  variables.  The  side  column  formulation  has  m  con¬ 
straints  and  n  -  K  variables.  It  is  best  defined  by  partitioning  the  node-arc  incidence  matrix  A  ” 
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[T:N]  where  T  is  m  x  2K  and  N  is  m  x  n-2K.  The  matrix  T  contains  the  first  2K  columns  of 

A,  which  correspond  to  arcs  appearing  in  equal  flow  constraints.  The  side  column  formulation  is 
given  by 

minimize  fv  +  gz 
s.t.  Sy  +  Nz  =  r 
0  i  z  £  u 


0  s  y  s  l 

where,  letting  t„  and  s,  denote  the  ith  columns  of  T  and  S, 

s»  =  tk  +  tK  +  k,  k  =  1,2,... ,K 

fk  —  ck  "t*  c^^ij,  k  1,2,...,K 

Uk  =  u„  k  =  1,2 . K. 

Table  I  details  the  computational  testing  of  the  algorithm  with  parameters  ITERL  =  15, 
ITERU  =  10  and  p  =  .005.  Of  the  10  problems  used,  the  first  three  are  transportation  problems 
(problems  5,  9,  and  10),  the  next  four  are  capacitated  transshipment  problems  (problems  20,  21, 
24,  and  25)  and  the  last  three  are  uncapacitated  transshipment  problems  (problems  28,  30,  and  35). 
The  test  problems  were  formulated  using  both  the  side  columns  model  and  the  side  constraint 
model.  Both  models  were  solved  using  MPSX  with  default  tuning  parameters.  The  side  column 
formulation  ran  slightly  longer  than  the  side  constraint  formulation  on  MPSX,  even  though  it  uses 
75  fewer  constraints  and  75  fewer  columns  for  the  test  problems  in  Table  I. 

For  the  test  problems,  EQFLO  obtained  feasible  solutions  whose  objective  function  values 
were  within  10%  of  the  optimal  in  one-sixteenth  of  the  time  required  by  MPSX  to  obtain  an  op¬ 
timum.  For  the  equal  flow  problem  with  400  constraints,  2692  arcs  and  75  equal  flow  pairs,  a  10% 
solution  as  well  as  a  5%  solution  are  obtained  in  one-hundreth  of  the  time  required  for  solution 
by  MPSX.  For  the  1500  node  problem  with  5880  arcs  a  5%  solution  was  obtained  in  6%  of  the 
time  required  by  MPSX.  Guaranteed  5%  solutions  across  all  problems  were  obtained  in  about 
one-seventh  of  the  MPSX  time. 
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To  determine  the  behavior  of  EQFLO  as  the  number  of  side  constraints  increases  and  the 
stopping  tolerance  decreases,  additional  testing  with  problems  21,  24,  and  28  was  performed.  Each 
of  these  base  problems  was  used  to  generate  equal  flow  problems  with  75,  100,  150,  and  200  equal 
flow  constraints.  The  corresponding  side  column  model  was  also  generated.  Table  II  summarizes 
the  computational  experience.  The  new  algorithm  performs  favorably,  when  compared  to  MPSX, 
in  obtaining  guaranteed  5%  solutions.  For  the  400  node  problem  with  2904  arcs  and  100  equal 
flow  contraints,  a  1%  solution  is  obtained  in  3%  of  the  time  required  by  MPSX. 


VII.  SUMMARY  AND  CONCLUSIONS 


The  equal  flow  problem  lends  itself  to  solution  by  decomposition  and  relaxation.  The  use 
of  these  techniques  in  the  solution  procedure  developed  is  advantageous  because  the  essential  sol¬ 
ution  mechanism  required  is  the  solution  of  sequences  of  pure  network  problems.  By  dispensing 
with  the  working  basis  required  by  other  techniques,  not  only  are  computational  efficiencies  af¬ 
forded  but  the  natural  characteristics  of  the  problem  enhanced. 

The  algorithm  can  be  modified  to  assist  in  the  solution  of  the  integer  equal  flow  problem. 
The  lower  bound  automatically  produces  integer  flows  and  the  projection  of  the  subgradient  in  the 
upper  bound  routine  can  be  altered  to  require  integrality  on  the  equal  flow  allocation.  Thus  this 
solution  procedure  not  only  provides  lower  bounds  on  the  integer  equal  flow  problem  efficiently, 
but  it  also  has  the  inherent  capability  of  producing  feasible  integer  solutions  with  ease. 

The  development  for  the  linear  equal  flow  problem  in  this  paper  can  be  instructive  in  mod¬ 
elling  and  solving  other  network  problems  with  specially  structured  side  constraints  such  as  pro¬ 
portional  flow  models  used  in  manpower  planning.  The  solution  technique  is  best  suited  for  a 
real-world  situation  in  which  one  must  quickly  produce  near  optimal  solutions. 
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I.  INTRODUCTION 


OTAS  is  a  FORTRAN  code  that  simulates  the  optimal  assignment  of  voice  and 
data  traffic  onto  the  trunks  available  to  a  node.  Voice  and  data  requests  fol¬ 
low  a  Poisson  arrival  process  and  the  service  duration  is  exponential.  Hence, 
at  a  given  clock  time  one  can  have  a  voice  request,  a  data  request,  or  a  dis¬ 
connect  from  a  previous  request.  A  request  is  assigned  on  the  least  cost  trunk 
which  has  the  available  capacity  and  delay  requirements. 


II.  THE  MODEL 


The  integer  programming  model  describes  mathematically  the  criteria  for 
assigning  a  service  request  to  an  available  trunk.  If  no  trunk  is  available 
the  model  has  no  solution  and  the  service  is  denied  in  the  simulation. 


2.1  Subscript 

Let  i  -  denote  a  trunk 

2.2  Constants 

Let  d  -  denotes  the  maximum  allowable  delay  of  the  requested 
service  (micro-secs). 

c  -  denotes  the  required  capacity  of  the  requested  service 
(K  bits/sec). 

-  denotes  the  delay  of  the  ith  trunk  (micro-secs). 

Ci  -  denotes  the  capacity  of  the  ith  trunk  (K  bits/sec). 

a£  -  denotes  the  availablity  of  the  ith  trunk  (%). 

Ui  -  denotes  the  current  usage  of  the  ith  trunk  (K  bits/sec), 

v^  -  denotes  the  unit  cost  for  the  ith  trunk  ($  (K  bit/sec)). 

2.3  Decision  Variables 

Let  xi  -  be  1  if  the  service  is  assigned  to  trunk  i  and,  0  otherwise. 

2.4  Constraints 

(TRUNK  CAPACITY) 

cx^  <_  c^a^  -  u^:  for  all  i. 


■ . ' 


(SELECT  ONE) 


(INTEGRALITY) 

Xi  *  0,  1;  for  all  i 

(OBJECTIVE  FUNCTION) 
minimize  I  vixi 
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The  input  consist  of  three  files  corresponding  to  the  service  data,  the 
trunk  data,  and  a  control  file  to  manage  the  simulation  length  and  output. 


3.1  Tape  1:  Service  Input  Data 

(All  data  are  in  free  format  one  entry/line.  Please  omit  decimal  points 


for 

integer  data.) 

Cols 

Type 

Name 

Description 

1-80 

Integer 

INTV 

Inter-arrival  time  between  calls  (secs) 
(Assume  exponential  distribution) 

1-80 

Integer 

SERV 

Expected  call  duration  (secs) 

(Assume  exponential  distribution) 

1-80 

Integer 

INTD 

Inter-arrival  time  between  data  requests 
(secs) 

1-80 

Integer 

SERD 

Expected  service  duration  (secs) 

(Assume  exponential  distribution) 

1-80 

Real 

CAPV 

Service  capacity  required  for  voice 
(K  bits/sec) 

1-80 

Real 

CAPD 

Service  capacity  required  for  data 
(K  bits/sec) 

1-80 

Integer 

DELV 

Maximum  allowable  delay  for  voice  service 
(micro-secs) 

1-80 

Integer 

DELD 

Maximum  allowable  delay  for  data  service 
(micro-secs) 

3.2  Tape 

2:  Trunk  Input  Data 

(Name  must  appear 
format  one  entry/line, 
follows  trunk  1,  etc.) 

in  columns  1 
Please  omit 

through  8.  All  other  data  are  in 
decimal  points  for  integer  data. 

free 

Trunk 

Cols 

Type 

Name 

Description 

1-80 

Character 

NAME(I) 

Trunk  I  name  (8  characters) 
Examples  include: 

LL  «  Land  Line 

SAT  -  Satellite 

HF  -  High  Frequency 

LOS  *  Line  of  Sight 

1-80 

Real 

CAPAC(I) 

Trunk  I  capacity  (K  bits/sec) 

1-80 

Integer 

DELAY(I) 

Trunk  I  delay  time  (micro-secs) 

1-80 

Integer 

AVAIL(I) 

Trunk  I  availablity  (0%  -  100%) 

1-80 

Real 

COST ( I ) 

Trunk  I  unit  cost  ($/(K  bits/sec)) 

3.3  Tape 

3:  Simulation  Control  Data 

(All  data  are  in  free  format  one  entry/line.  Please  omit  decimal  points 
for  integer  data.) 


Cols 

Type 

Name 

Description 

1-80 

Integer 

T0TALT 

Total  time  of  simulation  in  seconds 

1-80 

Integer 

PRINT 

Level  of  intermediate  output 

0  -  No  output 

1  -  Print  status  after  1%  of  total  time 

2  -  Print  every  event 


FILE:  SERVICE  DATA 


A1  SOUTHERN  METHODIST  UNIVERSITY  --  CMS  RELEASE  4.0 


60 

240 

60 

300 

64. 

9.6 

250 

1000 


SECS 

SECS 

SECS 

SECS 

K  BITS/SEC 
K  BITS/SEC 
MICRO-SECS 
MICRO-SECS 


(60  CALL/HOUR) 

(4  MINUTES/CALL) 

(60  DATA  REQUESTS/HOUR) 

(5  MINUTES/REQUEST) 

(VOICE  CAPACITY  REQUIRED) 
(DATA  CAPACITY  REQUIRED) 
(MAX  VOICE  DELAr  ALLOWABLE) 
(MAX  DATA  DELAY  ALLOWABLE) 


FILE:  TRUNK 


DATA  A1  SOUTHERN  METHODIST  UNIVERSITY  --  CMS  RELEASE  4.0 


(LAND  LINE) 

TYPE 

192 

(K  BITS/SEC) 

CAPACITY 

100 

(MICRO-SECS) 

DELAY 

100 

(X) 

AVAILABILITY 

.0015 

(  $/K  BITS/SEC)  ) 

COST 

IT 

(SATELLITE) 

TYPE 

128 

(K  BITS/SEC) 

CAPACITY 

500 

(MICRO- SECS) 

50 

(X) 

AVA 1  LAB  1 L 1 TY 

.0007 

(  $/K  BITS/SEC)  ) 

COST 

(HIGH  FREQUENCY) 

TYPE 

48. 

(K  BITS/SEC) 

CAPACITY 

250 

(MICRO- SECS) 

DELAY 

100 

(%) 

AVAILABILITY 

.0003 

(  S/K  BITS/SEC)  ) 

COST 

IS 

(LINE  OF  SIGHT) 

TYPE 

128 

(K  BITS/SEC) 

CAPACITY 

150 

(MICRO-SECS) 

85 

(X) 

AVAILABILITY 

.0005 

(  $/K  BITS/SEC)  ) 

COST 

FILE:  CONTROL  DATA  A1  SOUTHERN  METHODIST  UNIVERSITY  —  CMS  RELEASE  4.0 


7200  (SECS) 
0 


20  MINUTE  RUN 
PRINT  LEVEL 


« \flrWlnrvw\fli  ini  -  vj^vwwwi 


O 


IV.  OUTPUT  REPORTS 


Three  reports  are  generated  by  the  simulation  on  files  7,  8,  and  9. 
Reports  1  and  2  give  a  summary  of  the  statistics  kept  on  voice  and  data 
service.  Report  3  gives  a  summary  of  the  activity  on  each  trunk.  Sample 
reports  follow: 


$ 


w!vtvWWw>?S,wS?S 


FILE:  REPORT  1  DATA  A1  SOUTHERN  METHODIST  UNIVERSITY  —  CMS  RELEASE  4.0 
1REPORT  1:  SERVICE  STATISTICS  FOR  VOICE  REQUESTS 


INPUT: 

EXPECTED  INTER-ARRIVAL  TIME 

60  (SECS) 

INPUT: 

EXPECTED  DURATION 

240  (SECS) 

INPUT: 

REQUIRED  CAPACITY 

64.0000  (K  BITS/SEC) 

INPUT: 

MAXIMUM  ALLOWABLE  DELAY 

250  (MICRO-SECS) 

OUTPUT: 

TOTAL  REQUESTS 

121 

OUTPUT: 

NUMBER  OF  BLOCKED  REQUESTS 

41 

OUTPUT: 

X  REQUESTS  BLOCKED 

33  (X) 

OUTPUT: 

TOTAL  SIMULATION  TIME 

7239  (SECS) 

FILE:  REPORT2  DATA  A1  SOUTHERN  METHODIST  UNIVERSITY  —  CMS  RELEASE  4.0 


1  REPORT  2: 

SERVICE  STATISTICS  FOR  DATA 

REQUESTS 

INPUT: 

EXPECTED  INTER-ARRIVAL  TIME 

60 

(SECS) 

INPUT: 

EXPECTED  DURATION 

300 

(SECS) 

INPUT: 

REQUIRED  CAPACITY 

9 . 6000 

(K  BITS/SEC) 

INPUT: 

MAXIMUM  ALLOWABLE  DELAY 

1000 

(MICRO-SECS) 

OUTPUT: 

TOTAL  REQUESTS 

124 

OUTPUT: 

NUMBER  OF  BLOCKED  REQUESTS 

0 

OUTPUT; 

X  REQUESTS  BLOCKED 

0 

(X) 

OUTPUT: 

TOTAL  SIMULATION  TIME 

7239 

(SECS) 

FILE:  REP0RT3  OATA 


At  SOUTHERN  METHODIST  UNIVERSITY 


1REPORT  3:  TRUNK  USAGE 


TRUNK  NUMBER: 

1 

NAME 

LL 

CAPAC 1 TY 

192.00  (K  BITS/SEC) 

DELAY 

100  (MICRO-SECS) 

AVAILABILITY 

100  (X) 

UNIT  COST 

0.00150  (  $/( K  BITS/SEC) 

TRUNK 

UTILIZATION 

UTILIZATION 

TIME  (SECS)  X  TOTAL  ' 

0-  to  % 

11-  20  % 

21-  30  X 

31-  40  X 

41-  50  X 

51-  60  X 

61-  70  X 

71-  80  X 

81-  90  X 
91-100  X 

846  11.69 

0  0.00 

0  0.00 

1274  17.60 

0  0.00 

0  0.00 

2422  33.46 

0  0.00 

0  0.00 

2697  37.26 

TOTAL 

7239 

1REP0RT  3:  TRUNK  USAGE 

TRUNK  NUMBER: 

2 

NAME 

SAT 

CAPACITY 

128.00  (K  BITS/SEC) 

DELAY 

500  (MICRO-SECS) 

AVAILABILITY 

50  (X) 

UNIT  COST 

0.00070  (  $/( K  BITS/SEC) 

TRUNK 

UTILIZATION 
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FILE:  REP0RT3  DATA 


A1  SOUTHERN  METHOOI ST  UNIVERSITY  —  CMS  RELEASE  4.0 


UTILIZATION 


TIME  (SECS) 


X  TOTAL  TIME 


0-  10  % 

6547 

90.44 

11-  20  J 

152 

2.  10 

21-  30  X 

474 

6.55 

31-  40  X 

66 

0.91 

41-  50  X 

0 

0.00 

51-  60  X 

0 

0.00 

61-  70  X 

0 

0.00 

71-  80  X 

0 

0.00 

81-  90  X 

0 

0.00 

91-100  X 

0 

0.00 

TOTAL 

7239 

1REPORT  3:  TRUNK  USAGE 

TRUNK  NUMBER:  3 

NAME  HF 

CAPACITY  48.00 

DELAY  250 

AVAILABILITY  TOO 

UNIT  COST  0.00030 


TRUNK 


48.00  (K  BITS/SEC) 

250  (MICRO-SECS) 

100  (X) 

0.00030  (  $/ ( K  BITS/SEC)  ) 

UTILIZATION 


riLIZATION 

TIME  (SECS) 

X  TOTAL 

o-  10  X 

144 

1.99 

11-  20  X 

607 

8.39 

21-  30  % 

0 

0.00 

31-  40  X 

1591 

21.98 

41-  50  X 

0 

0.00 

51-  60  X 

1128 

15.58 

61-  70  X 

0 

0.00 

71-  80  X 

1709 

23.61 

81-  90  X 

0 

0.00 

91-100  X 

2060 

28.46 

TOTAL 

7239 

IRE PORT  3:  TRUNK  USAGE 


mmsssmmmssm 


FILE:  REPORT 3  DATA 


A1  SOUTHERN  METHODIST  UNIVERSITY 


TRUNK  NUMBER: 

4 

NAME 

LOS 

CAPACITY 

128.00 

(K  BITS/SEC) 

DELAY 

150 

(MICRO-SECS) 

AVAILABILITY 

85 

<%) 

UNIT  COST 

0.00050 

(  $/( K  BITS/SEC)  ) 

TRUNK 

U  T  1  L  1 

1  Z  A  T  1  0  N 

UTILIZATION 

TIME  (SECS)  %  TOTAL  TIME 

0-  10  % 

613 

8.47 

11-  20  X 

147 

2.03 

21-  30  % 

215 

2.97 

31-  40  X 

0 

0.00 

41-  50  i 

2478 

34.23 

51-  60  X 

1170 

16.  16 

61-  70  % 

1523 

21.04 

71-  80  X 

1093 

15.  10 

81-  90  X 

0 

0.00 

91-100  X 

0 

0.00 

TOTAL 

7239 
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ABSTRACT 

This  paper  presents  a  parallelization  of  the  simplex  method  for  linear 
programming.  Current  implementations  of  the  simplex  method  on  sequential 
computers  are  based  on  a  triangular  factorization  of  the  inverse  of  the  current 
basis.  An  alternative  decompostion  designed  for  parallel  computation,  called 
the  quadrant  interlocking  factorization,  has  previously  been  proposed  for 
solving  linear  systems  of  equations.  This  research  presents  the  theoretical 
justification  and  algorithms  required  to  implement  this  new  factorization  in  a 
simplex-based  linear  programming  system. 


I.  INTRODUCTION 

The  introduction  of  parallel  computers  into  scientific  computing  in  the  past  decade 
is  the  beginning  of  a  new  era.  The  invention  of  new  algorithms  will  be  required  to  ensure 
realization  of  the  potential  of  these  and  future  architectural  improvements  in  computers. 
Already  the  use  of  parallel  computers  has  given  rise  to  studies  in  concurrency  factors, 
vectorization,  and  asynchronous  procedures.  These  have  led  to  multifold  increases  in 
speed  over  conventional  serial  machines  after  the  calculations  have  been  rearranged  to 
take  advantage  of  the  specific  hardware.  This  paper  presents  a  parallelization  of  the  sim¬ 
plex  algorithm  for  general  linear  programs.  Our  work  begins  with  new  results  for  solving 
systems  of  linear  equations  and  is  directed  toward  the  hardware  design  currently  adapted 
by  Sequent  Computer  Systems,  Inc.  of  Beaverton,  Oregon. 

The  following  notation  is  used  throughout  this  paper.  Let  B,.jik:l  represent  a  subma¬ 
trix  of  B  composed  of  rows  i  through  j  and  columns  k  through  /.  If  i=j  (k=l),  we  write 
The  jth  row  (column)  of  B  is  denoted  by  (B ,j).  The  ijth  element  of 

B  is  B,  ^ . 

The  linear  programming  problem  is  represented  mathematically  as  follows: 
minimize  cTx 
subject  to  Ax  —  b 
0  <x<u, 

where  A  is  a  known  m  by  n  matrix,  all  other  quantities  are  conformable,  and  all  vectors 
are  known  except  x . 

The  upper  bounded  version  of  Dantzig’s  simplex  method  for  solving  the  linear  pro¬ 
gramming  problem  may  be  stated  as  follows: 

Algorithm  l  ,1  :  The  Simplex  Method 


0.  Initialization 


Le:  [,tfl  1  jc'v ]  be  a  basic  feasible  solution  with  A  =  [B  IN].  Let  the  cost  vector 
[CB  |c,vj  bounds  [UB  |uVj  be  partitioned  similarly.  Assume  that/?-1  is  avail¬ 
able  in  some  factored  form.  Initialize  iter  to  0  and  the  reinversion  frequency, 
freq. 

1.  Calculate  the  Dual  Variables  (BTRAN) 

n*-cBB~l.  (1.1) 


2.  Pricing 

Let  K  i  =  [j :  x1}  =  0  and  c j1’  -  nN <  0}, 
and  K2  =  {j:  x*  =  u*  and  c;S  -  JtN  ;-  >  0 }. 

If  A'i  K2  =  O,  terminate  with  [xB  l.rv]  optimal; 


Xk  <-  X?  +  A5 


xB  <-xD  -  A5\ 


If  A  =  u'k ,  return  to  1. 


6.  Basis  Inverse  Update 


Let  p  denote  the  index  of*3  which  produced  A  and  set 


/}>  ,  if  i  *p 


1  Jyp  ;  otherwise , 


£  *—  /  —  ep  ej  +  TicJ" 


B~x  <-££-'. 


7.  Reinversion  Check 


iter  4-  /rer  +  1. 


If  mod  ( iter  Jreq )  =  0,  then  refactor  B~l. 

Return  to  1  using  £-1  as  B~l,  the  current  basis  inverse. 


Two  of  the  most  common  factorizations  of  the  basis  matrix  inverse  are  the  product 


form  and  the  elimination  form,  which  correspond  to  the  methods  for  solution  of  linear 


equations  known  as  Gauss-Jordan  reduction  and  Gauss  reduction  (LU  factorization). 


respectively,  where  L  is  a  lower  triangular  matrix  and  U  is  an  upper  triangular  matrix. 


The  elimination  form  produces  a  sparser  representation  of  the  basis  inverse  than  the  pro¬ 


duct  form,  and  accordingly  leads  to  faster  implementation  of  a  simplex  iteration  and  a 


considerable  savings  in  storage. 


Historically,  the  elimination  form  of  the  inverse,  due  to  Markowitz  [1957-1],  was 


the  first  LU  factorization  method  and  was  introduced  to  preserve  sparsity  during  reinver¬ 


sion.  However,  once  reinversion  was  completed  further  pivot  operations  were  handled 


MM 


i* 


using  product  form.  Bands  and  Goiub  proposed  updating  L  and  U  in  a  numerically 
stabie  way.  (see  Bartels  [1971-1]).  Their  updating  scheme  tends  to  promote  the  growth  of 
nonzeros  in  U,  leading  to  a  potentially  severe  loss  of  sparsity.  Forrest  and  Tomlin  [1972- 
1]  designed  a  different  updating  scheme  for  the  triangular  factors  to  preserve  sparsity  at 


some  sacrifice  in  numerical  stability.  Subsequent  implementation  of  the  Banels-Golub 
method,  designed  by  Reid  [1982-1]  and  Saunders  [1976-1],  combine  the  virtues  of  accu¬ 
racy  and  speed. 

Several  parallel  versions  of  the  LU  factorization  algorithm  for  solving  general  linear 
systems  of  equations  are  presently  available  (Chen  et  al.  [1984-1]  and  Dongarra  and 
Sorensen  [1984-2]).  All  versions  are  based  on  restructuring  the  original  serial  algorithm 
to  reveal  possible  independent  tasks  that  can  be  carried  out  concurrently. 

Evans  and  Hitzopoulos  [1979-1]  proposed  a  matrix  factorization,  called  the  Qua¬ 
drant  Interlocking  Factorization  (QIF),  as  an  appropriate  tool  for  solving  linear  systems 
on  parallel  computers.  The  QIF  is  similar  to  the  LU  factorization,  but  is  claimed  to  be 
more  suitable  for  concurrent  computation. 

This  paper  presents  a  parallelization  of  the  simplex  method  using  the  QIF.  The  out¬ 
line  of  the  paper  is  as  follows.  In  Section  II,  the  QIF  is  developed.  An  algorithm  for 
updating  the  QIF  of  B~]  is  presented  in  Section  ID.  Mathematically,  the  problem  is  to 
efficiently  obtain  a  factorization  of  B~ 1  (see  step  6  of  Algorithm  1.1)  from  the  factoriza¬ 
tion  of  B~l.  In  Section  IV,  we  develop  a  parallelization  of  the  reinversion  routine  used  in 
step  7  and  propose  a  parallel  implementation  of  both  the  BTRAN  and  FTRAN  operations 
of  steps  1  and  3. 

The  parallel  algorithms  presented  in  this  study  are  designed  for  a  MIMD  parallel 
computer  that  incorporates  p  identical  processors  sharing  a  common  memory  and  capa¬ 
ble  of  applying  all  their  power  to  a  single  job  in  a  timely  and  coordinated  manner.  The 
Balance  Systems  8000  and  21000  from  Sequent  Computer  Systems  are  examples  of  such 
machines. 
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n.  THE  QUALRANT  INTERLOCKING  FACTORIZATION 


Ln  this  section  we  describe  a  matrix  factorization  suggested  by  Evans  and  Hatzo- 
poulos  [1979-1]  known  as  the  Quadrant  Interlocking  Factorization  (QIF).  This  decom¬ 
position  is  designed  to  solve  linear  systems  on  parallel  computers  (see  Evans  and  Hatzo- 
poulos  [1979-1],  Evans  and  Hadjidimos  [1980-1],  Evans  [1982-1]  and  Feilmeier  [1982- 
1]).  The  factors  and  some  of  their  characteristics  are  described  in  Secdon  2.1.  We  show 
that  any  nonsingular  matrix  can  be  factorized  into  its  QIF  in  two  ways,  the  Forward  QIF 
and  the  Backward  QIF  The  factorization  algorithms  are  developed  in  Sections  2.2  and 
2.3.  The  relationship  of  quadrant  and  triangular  matrices  is  presented  in  Section  2.4. 


2.1  The  Quadrant  Interlocking  Factors 


Consider  the  following  matrix 


1  0 
H  ;  1  1 

Wl  1  W  7 


Hm-2.1  —  2.' 

m  - 1 , 1  0 


u  ™2jn 

W3.m-1  VV3/n 


-2  jn  - 1  ~2jn 

1  - 1  jn 


Note  that  the  non-arbicrary  entries  of  W  are  given  by 
w,  j  0.1=1 . [m/2]  ,  ;'=(i+l) . (m-i'+l); 

I  0 ,/=m m  , )=m-i+l,...,i-l; 

L 

where 


fV  JVl  »  m F  *  J*  ■  “  J  *  *  *■•»•«**  *  *  »  %  *  •'  “  J  f  f  ^  ^  •  » 


"s*  '  s  V  *V 

*  ^  >  **-  "A.  O.  ■.  - 


[.v  ]  =  the  largest  integer  not  greater  than  the  value  of  x 
m=  m  +  1  -  [mil]. 

Also,  consider  the  matrix 


2 1.1  2  1.2  •  •  ■  ’  l,m-l  2\jn 

0  2  2.2  ■  ■  ■  2zm-l  0 
0  0  .  .  .  0  0 

Z  =  .  .  .  (2.3) 

o  o  :::  6  6 

0  2/n  — 1.2  ■  •  •  0 

Lm.  1  ,2  *  •  •  Zm  ,m  —  \  ,m  I 


Note  that 

» 

7=1 t(m  — 1)/2]  , i=/+l m-;'; 

zt  j  =  0,  i  (2.4) 

j-[m /2]-r2....jn  ,  i=m+2-j,...,j-\. 

Any  square  matrix  may  be  panitioned  by  its  diagonal  and  secondary  diagonal  into 
four  quadrants.  The  potentially  nonzero  elements  of  W  are  in  the  left  and  right  quadrants 
while  those  of  Z  are  in  the  upper  and  lower  quadrants.  Therefore,  we  call  any  square 
matrix  whose  nonzero  structure  follows  (2.1)  and  (2.2),  or  one  that  can  be  brought  to 
such  a  form  by  row  and/or  column  interchanges  a  left-right  quadrant  (LRQ)  matrix. 
Similarly,  any  square  matrix  whose  nonzero  structure  follows  (2.3)  and  (2.4),  or  one  that 
can  be  brought  to  such  a  form  by  row  and/or  column  interchanges  is  called  an  upper- 
lower  quadrant  (ULQ)  matrix  .  Examples  of  W  and  Z  matrices  for  an  odd  and  an  even  m 
are  given  below: 


1  0  0  0  0 

VV;i  1  0  0 

"'3.1  "'3.2  1  "3.4  "3.5  .2 

'V  4  i  0  0  1  Wi  J 

o'  0  0  0  f 


Example  2.2  (m=  6) 


1  0  0  0  0  0 

W2,1  1  0  0  0  VV2.6 

"3.1  "3.2  1  0  "3.5  "3.6 

"4.1  "4.2  0  1  "4,5  "4,6 

H’5,1  0  0  0  1  Wsa 

o'  0  0  0  0  l' 


‘1,1  z  1.2  z  1.3  z  1.4  z  1.5 

0  z  2.2  z2.2  z  2,4  0 

0  0  r3>3  0  0 

0  2 42  24.3  24.4  0 

25.1  2  5.2  25,3  Z5.4  Z  5,5 


2  1.1  2  i>2  Z\J2  Z  1,4  Z  ),6 

0  2 2.2  22.3  22.4  22.5  0 

0  0  z  3,3  z  3,4  0  0 

0  0  z4,3  Z44  0  0 

0  25.2  25,3  Z54  Zs,5  0 

2  6,1  2  6,2  Z  6,3  Z5  4  Z6,5  Z6,6 


Without  loss  of  generality  we  assume  that  m  is  even.  For  linear  programming,  we 
can  always  append  a  nonbinding  constraint  so  that  the  total  number  of  constraints  is 


The  set  of  all  LRQ  matrices  of  order  m  is  denoted  by  [M%)  and  the  set  of  all  ULQ 
matrices  of  order  m  is  denoted  by  { M^}.  Let  A  E Rmjn  and  A=A,,;.e,-.e;r.  If 
(A  +1  we  say  that  A,j  is  a  W-element  ;  otherwise,  it  is  a  non-W-elemeni .  Simi¬ 

larly,  if  A  z{Mh}  we  say  that  A,  j  is  a  Z-element ;  otherwise,  it  is  a  non-Z-elemeni. 
Proposition  2.1 

[Mm)  and  [M^)  are  closed  under  addition,  scalar  multiplication,  multiplication  and 
inversion  . 

(The  proof  of  this  Proposition  may  be  found  in  Zaki  [1986-1]). 


2.2  The  Forward  Quadrant  Interlocking  Factorization  Algorithm 

In  this  section  we  present  an  algorithm  which  obtains  the  IVZ  factorization  of  any 
nonsingular  matrix.  That  is,  given  a  nonsingular  matrix  B ,  find  IV  and  Z  such  that 
B  -  WZQ ,  where  Q  is  a  permutation  matrix.  This  factorization  is  analogous  to  the  LU 


factorization  in  common  use  in  many  production  linear  programming  packages. 
Definition  2.1 


An  elementary  left-right  quadrant  (ELRQ)  matrix  of  order  m  and  index  k  is  a  matrix  of 
the  form: 


Nk  =1  -  uk-el  -  vk-ef  (2.5) 

where 

/  =  m  -  k  +  1  ,  k  £1,2 /  2)-l,  (2.6) 

ej-uk-  0  and  epvk=0  for  i=\,2,...,k  ,1  ,l+\ . m.  (2.7) 

The  conditions  (2.7)  require  that  the  first  k  and  last  k  components  of  uk  and  v*  be  zero, 
that  is.  w^andv*  have  the  form: 

u  k  =  (0,0 0  ,uL\  A+i  ,---,uk.k  ,0,0 0)T  (2.8) 

v*  =  (0,0 0,v£+\  ,v^2, . . .  ,vk_k, 0,0,... ,0)T .  (2.9) 


In  general  an  ELRQ  matrix  of  order  m  and  index  k  has  the  form  depicted  in  Figure 
2.1.  Thus,  an  ELRQ  matrix  of  index  k  is  a  LRQ  matrix  whose  only  nonidentity  columns 
are  columns  k  and  I  {l=m~k  + 1).  ELRQ  matrices  are  easily  inverted.  It  is  apparent  that 

-t 


I\’k 


-I  +  ukel  +  vk-ef 


(2.10) 


which  is  also  an  ELRQ  matrix  of  index  k. 
Proposition  2.2 

Let 


N^  =  N]N2  ■  ■  -Nk  (2.11) 

where  N‘  is  an  ELRQ  matrix  of  index  i  ,  i=l,2,...,k.  Then  N is  a  LRQ  matrix  whose 
j'h  and  ( m-j+l )*  columns  are  those  of  NJ . 

(The  proof  of  this  Proposition  may  be  found  in  Zaki  [1986-1]). 

Definition  2.2 


0-10 


J 


A  panially  reduced  upper-lower  quadrant  (PRULQ)  matrix  of  index  k  and  order  m  is  a 
square  matrix  whose  non-Z  elements  are  zero  in  columns  1  through  £-1  and  /+1  through 
m,  where  k  -  l,2,...,m/2  and  l  =m-k+ 1.  Its  general  form  is  shown  in  Figure  2.2.  Note 
that  B 1  has  no  special  zero  structure  and  Bma  is  an  ULQ  matrix. 

Proposirion  2.3 

Let  Bk  be  a  PRULQ  matrix  of  index  k.  If  Bk  is  nonsingular  then  there  exist  j\  and  j 2 


such  that  k  <  j\<  jz^l  and 


5  =  Btj,  .Bfa-B^.Bfa  *  0. 


(2.12) 


Proof 


Suppose  5=0  for  even’  k<j\<ji<l.  Then  f?£.  must  be  a  multiple  of  B .  This  contradicts 
the  assumption  that  Bk  is  nonsingular. 

Permuting  the  columns  of  a  PRULQ  matrix  so  that  certain  elements  provide  a  non¬ 
singular  2x2  submatrix  is  analogous  to  interchanging  rows  and  columns  in  matrix  inver¬ 
sion  to  obtain  a  nonzero  pivot  element.  Now,  let  B*  be  a  nonsingular  PRULQ  matrix  of 
index  k .  Let  j\  and  j 2  satisfy  Proposition  2.3  and  define  Qk  to  be  the  permutation  matrix 


such  that 


where 


Bk =Bk  Qk 


B \  =  B  and  B  ;  =  B 


J=D.h 


(2.13) 


Let  A  be  any  square  matrix  of  order  m  and  let  kt{l,...^n/2}.  Define  Sk(A  )  to  be  the  fol¬ 


lowing  2x2  principal  submatrix  of  A 


_.  Ak.k  Ak,i 

S  {A)=  Al,k  AU 


(2.14) 


where  /  =m-k+ 1.  Using  these  definitions  and  Proposition  2.3,  it  is  clear  that 
Bk  =Bk  Qk  is  a  nonsingular  PRULQ  matrix  of  index/:  and  Sk(Bk  5  is  nonsingular. 

We  now  show  how  one  may  transform  a  PRULQ  matrix  of  index  k  into  a  PRULQ 


mm\ 


yVV 
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Figure  2.2. 


Illustration  of  a  PRULQ  matrix  of  order  m  and  index  k 


matrix  of  index  £+1. 
Proposition  2.4 


Let  B*  be  a  nonsingular  PRULQ  matrix  of  index  k  and  let  Qk  be  the  permutation  matrix 
that  interchanges  columns  k  and  m-k  + 1  with  columns  j\  and  /j,  respectively,  where  j\ 
and  ji  are  obtained  so  that  they  satisfy  Proposition  2.3.  Let  N*  be  an  ELRQ  matrix  of 
index  k  whose  uk  and  v*  vectors  are  determined  by  solving  the  following  ( m-2k )  2x2 


linear  systems 


m*  V*  5 *(£-*>=  Bkk  B?J  ,i=k+\ . m-k.  (2.15) 

Then  B*+1  =  <V*  Bk  Qk  is  a  nonsingular  PRULQ  matrix  of  index  k  +  1. 

Proof 

Since  Bk  is  nonsingular  and  Nk  is  nonsingular,  then  B*+1  is  nonsingular.  Bk+l  is  a 
PRL'LQ  matrix  of  index  k  +  1  if  all  non-Z-elements  in  columns  1  through  k  and  /  through 
111  are  zero.  Since  fi*  is  a  PRULQ  matrix  of  index  k  ,  we  only  need  to  show  that  the 
effect  of  A'*  on  Bk  is  to  zero  out  the  non-Z-elements  in  columns  kj.  To  show  this,  we 


begin  by  rewriting  (2.15)  as 


1  Bit  B'tk  r  1 

u?  v*  =  Bfo  Bti 

B'tj  Bfj  1 


or  fort  =  k+l,k+2,...,m-k 


uk-B£,k  +  vk-Btk=Bkk 
u!c-B'£J  +  vk-B'f'i=B'ti. 


(2.16) 

(2.17) 


We  now  consider  the  non-Z-elemems  of  B 
For  i  =  k  +  l,k+2....yn-k 


Bkk]=Nk.  Bkk 

=  -uk  ■  Bk  -  vk  ■  Bt.k  +  Bkk  =  0  by  (2.16). 
B?j 1  =  N*.  •  Bkj 

=  -11*  ■  B£j  -  v*  •  Btj  +  Bkj  =  0  by  (2.17). 


(2.18) 


(2.19) 


m 


HI 

El 

|<>v 

:%:v 


BkJ 1  =Nj'.-Bk'j  =0  for;=l . k-l  and  /+l,...,m.  (2.20) 

Also  we  note  that  the  desired  zeros  created  in  earlier  stages  in  Bk  are  not  affected  by  Nk 
,  since  for  i  =1 . ifc-l,/+l,...,m 

B *+1  =  N*.  •  B  *  =  e,  ■  Bk  =  Bk, .  (2.2 1 ) 

From  (2.18)  through  (2.21)  we  conclude  that£i+1  is  a  PRULQ  matrix  of  index  £+1. 


Given  the  above  definitions,  the  forward  quadrant  interlocking  factorization  algo¬ 
rithm  may  be  stated  as  follows. 

Algorithm  2.1  :  The  Forward  Quadrant  Interlocking  Factorization 

Let  .  The  following  steps  decompose  B  to  its  quadrant  interlocking  factors  with 

B  =  W  Z  Q  . 

Initialize 


B'  =  fl, 

K  =  m/2. 

Main  Loop 

For  k  =  1.2 . K- 1 

1.  Column  Permutation 

Find  ji  and  y2  satisfying  Proposition  2.3. 

If  none  exists,  then  terminate  with  the  conclusion  that  B  is  singular. 
Otherwise,  construct  Qk  using  j\  and  ji- 

2.  Compute  the  vectors  uk  ,vk 

by  solving  the  (m-2k)  2x2  linear  systems,  (2.15). 

3.  Consuuct  rv  ’• 


Nk  -I  -  uk  -ej -  vk-ej. 

4.  Construct  fi**1 

Bk*'  =  Nk  Bk  Qk. 

Next  k 
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Proposition  2.5 


Let  B  be  a  nonsingular  matrix  of  size  m.  Then  Algorithm  2.1  decomposes  B  to  its  for¬ 


ward  quadrant  interlocking  factors , 


B  =  W  Z  Q 


(2.22) 


where 


(1)  Wt  {M%]  ,  W  = 

Q)Zz(Mm)  ,z  =£*,and 

(3)  Q  is  a  permutation  matrix  ,  Q  =  ( Q]Q 2  ■  ■  ■  Q*-1)-1. 


Proof 


Let  B  1  =  B  .  Applying  Proposition  2.4  for  k  =1,2 (ml 2)— 1 ,  we  obtain 


Bk  =Nk-'Nk~2-  Q1  ■■■QK~2QK~\ 


(2.23) 


where  BK  is  an  ULQ  matrix,  NJ ,  j  =  l,...,Af-l  are  ELRQ  matrices  as  computed  in  (2.15) 


and  Q-'  are  permutation  matrices.  From  (2.23), 


B 1  =  (.V*-1  Nk~2  ■  ■  -  A/1)-1  BK  (21  •  •  .qK-2qK-\)-\ 


(2.24) 


Let  =  (Nk'~l  ■  ■  •  N1)-1  .  By  Proposition  2.2  A^-ri  is  a  LRQ  matrix.  Also, 

let  =  (g1  •  ■  •  QK~ 2  QK-I)'1.  Since  the  product  of  permutation  matrices  is  a  per¬ 

mutation  matrix,  Q is  a  permutation  matrix.  Thus,  (2.24)  can  be  written  as 


fil  =B  =N^-»  BK  Q&-V 


(2.25) 


and  (2.22)  follows  by  setting  W  =  N^K~^\Z  =  BK ,  and  Q  =  in  (2.25). 


Proposition  2.6 


Algorithm  2.1  without  column  permutations  requires 


m3/3  +  m2ll  -  4 m  /3 


multiplications  on  a  sequential  machine. 


Proof 


Ignoring  column  permutations,  we  trace  the  operations  in  the  main  loop  excluding  step  1 


The  number  of  multiplications  to  compute  uk  and  v* 


ft 

0. 

IK 


n 


Ql 


55 


AT —I 

=  £  [2  +  6(m-2k)} 

=  m  +3  m(m  -2)12. 

The  number  of  multiplications  to  compute  B  *+1 
K-\ 

=  2.  I  (m-2 k)2 


(2.26) 


=  m.  (m-l).(m-2)/3.  (2.27) 

Summing  (2.26)  and  (2.27)  we  obtain  the  specified  total  number  of  multiplications. 

In  Algorithm  2.1  the  columns  of  the  PRULQ  matrix  are  permuted  to  find  a  2x2 
matrix  with  a  nonzero  determinant.  There  are  obvious  alternatives  that  may  be  used.  To 
ensure  numerical  stability  for  instance,  we  may  find  the  matrix  whose  determinant  has 
the  largest  absolute  value,  or  the  matrix  that  has  the  smallest  condition  number.  Another 
approach  is  to  permute  the  rows  of  the  PRULQ  matrix  to  find  the  required  nonsingular 
2x2  matrix  attempting  to  minimize  fill-in  in  the  nonpivot  rows.  Both  row  and/or  column 
permutations  can  be  selected  on  numerical  stability  and/or  sparsity  grounds. 


2.3  The  Backward  Quadrant  Interlocking  Factorization  Algorithm 

Unlike  the  triangular  factors  (L,U)  of  a  matrix,  the  quadrant  interlocking  factors 
(W,Z)  possess  different  potential  density.  That  is,  the  number  of  potentially  nonzero  ele¬ 
ments  in  W  is  different  than  that  in  Z.  In  this  section  we  present  an  algorithm  which 
obtains  the  ZW  factorization  of  any  nonsingular  matrix.  We  refer  to  this  algorithm  as  the 
Backward  QIF  algorithm,  as  opposed  to  the  Forward  QIF  algorithm  of  Section  2.2  that 
produces  the  WZ  factorization.  The  development  of  this  algorithm  is  very  similar  to  the 
previous  one.  The  proofs  of  Propositions  2.7  through  2.10  in  this  section,  use  arguments 
similar  to  those  used  in  Propositions  2.2  through  2.5  and  hence  are  omitted. 


V  /vVvW/V’A/vW/  .» >  >Vv 


} 


)cfiniiion  2.3 


An  elementary  upper  lower  quadrant  (EULQ)  matrix  of  order  m  and  index  k  is  a  matrix 


of  the  form  : 


Mk  =/m  -  rk-el - sk-ej -  e^-ej -  ei-ef 


(2.28) 


where 


/  -m  -  k  +  1,  /tel ,2,  •  ■  •  jn/2, 

ej-rk= 0  and  ej-sk=0  for  i-k+l,k+2,...,l. 


(2.29) 


The  conditions  (2.29)  require  that  components  £+1  through  m-k  of  r*  and  sk  be  zero, 
which  are  the  non-Z-elements  of  rk  and  sk  in  Af  *.  That  is,  r*  and  s*  have  the  form  : 


rk  =  ('"i . r£,  0 . 0,r/U..,r*)r 

=  (J  * . j£,0 . 0,j/:,„.,j*)7'. 


(2.30) 

(2.31) 


Thus,  an  EULQ  matrix  of  index  A'  and  order  m  is  an  ULQ  matrix  whose  only  nonidentity 
columns  are  columns  k  and  /  (/=m-A+l).  In  general,  it  has  the  form  depicted  in  Figure 


The  set  of  all  nonsingular  EULQ  matrices  is  closed  under  inversion,  and  the  inverse 


of  any  nonsingular  EULQ  matrix  of  index  k  is  another  EULQ  matrix  of  index  k. 


Proposition  2.7 


Let  M{k)  =  MkMk~]  ■  ■  ■  M 1  where  M‘  is  an  EULQ  matrix  of  index  i  ,  i=l,2 . k.  Then 

M(i)  is  a  ULQ  matrix  whose  j,h  and  (m-j+ 1)*  columns  are  those  of  Mi  ,j- l,2,...,m/2. 
The  proof  ‘  -  similar  to  that  of  Proposition  2.2. 


Definition  2.4 


A  partially  reduced  left-right  quadrant  (PRLRQ)  matrix  of  index  k  and  order  m  is  a 


square  matrix  whose  non-W -elements  are  zero  in  columns  A  +  l  through  m-k.  Note  that 
Bm'2  has  no  special  zero  structure  and  B 1  is  an  LRQ  matrix.  In  general,  a  PRLRQ  matrix 


is  of  the  form  shown  in  Figure  2.4. 


Let  Bk  be  a  PRLRQ  matrix  of  index  k.  If  Bk  is  nonsingular  then  there  exist  j\  and  jz 
such  that  1  <  j i  < k  and  /  <  ji<m  and 

5  =  Btj,  '  BjcJ1  -Bt,n  ■  Bf'Jl  *  0  (2.32) 

The  proof  is  similar  to  that  of  Proposition  2.3. 

Now  let  j  j  and  jz  satisfy  Proposition  2.8  and  define  Pk  to  be  a  permuted  identity 
matrix  with  column  j\  in  the  k,h  position  and  jz  in  the  Ith.  Let  Bk  be  a  nonsingular 
PRLRQ  matrix  of  index  k.  Obviously,  Bk  -Bk  Pk  is  a  nonsingular  PRLRQ  matrix  of 
index  k  ,  and  Sk(Bk )  is  nonsingular. 

Using  Mk  of  (2.28)  and  the  Pk  defined  above,  the  elimination  operation  needed  to 
reduce  a  PRLRQ  matrix  of  index  k  a  step  further  is  given  by  the  following  Proposition. 
Proposition  2.9 

Let  be  a  nonsingular  PRLRQ  matrix  of  index  k  ,  let  j\  and  jz  satisfy  Proposition  2.8, 
let  Pk  be  the  permutation  matrix  that  permutes  columns  k  and  jx  and  columns  m-£  +  l 
and  jz-  Let  Mk  be  an  EULQ  matrix  of  index  k  whose  rk,sk  vectors  are  determined  by 
solving  the  following  2k  -2  linear  systems 

rk  sk  .S(Bk)=  [fl**  Bk,  ,  i=l . k-l  and/+l,...^i  (2.33) 

along  with  the  system 

it  a-34* 

Then  =  Mk  Bk  Pk  is  a  nonsingular  PRLRQ  matrix  of  index  k- 1. 


Given  the  above  definitions,  we  may  state  the  backward  QIF  algorithm  as  follows: 


Alcoriihm  2.2  :  The  Backward  Quadrant  Interlocking  Factorization 


Let  B  zR  m  •m .  The  following  steps  decompose  B  to  its  QIF  with  B  -  Z  W  P . 


Initialize 


Bm :  =  B , 

K  -mil. 

Main  Loop 

For  k  =K,K-l,K-2 . 1 

1.  Column  Permutation 

Find  j  i  and  j  j  satisfying  Proposition  2.8. 

If  none  exists,  then  terminate  with  the  conclusion  that  B  is  singular. 
Otherwise,  construct  Pk  using  j\  and  j 2- 

2.  Compute  the  vectors  rk  ,sk 

by  solving  the  (24-1)  2x2  linear  systems  (2.33)  and  (2.34). 

3.  Construct  Mk 

Mk  =  1  m  ~rk  ej ~sk  ej -ek-ej -evej. 

4.  Construct  Bk~x 

Bk~l 2 3  =  Mk  Bk  Pk. 

Next  k 

Proposition  2. 10 

Let  B  be  a  nonsingular  matrix  of  size  m.  Then  Algorithm  2.2  decomposes  B  to  its  back¬ 
ward  QIF, 

B  =  Z  W  P  (2.35) 

where 

(1)  Ze  {M$,}  ,  Z  =  {MXMZ  ■  ■  ■ 

(2)  We  (MX)  ,  IV  =  B  ‘.and 

(3)  P  is  a  permutation  matrix  ,  P  ={PK  ■  ■  ■  P  ‘)_1 . 

The  proof  is  similar  to  that  of  Proposition  2.5. 

As  with  the  Forward  QIF  Algorithm,  row  and/or  column  permutations  can  be 
adopted  to  ensure  numerical  stability  and/or  sparse  factors. 

D-20 


2.4  Some  Characteristics  of  Quadrant  Matrices 

In  this  section  we  reveal  a  relationship  between  the  quadrant  and  triangular 
matrices,  which  has  not  previously  appeared  in  the  open  literature  (e.g.  Evans  and  Hatzo- 
poulos  [1979-1],  Evans  and  Hadjidimos  [1980-1],  Evans  [1982-1],  Feiimeier  [1982-1], 
Hellier  [1982-1],  and  Shanehchi  and  Evans  [1982-1]).  A  permutation  algorithm  that  res¬ 
tructures  any  quadrant  matrix  as  a  block  triangular  one  is  presented. 

Consider  the  following  matrices 


9  m 

1  0  x  x  .  x  x 

X  X 

1  X  X  .  X  X 

X  X 

1  0  .  x  x 

X  X  X  X 

1  .  X  X 

z  = 

X  X  X  X 

,  w  = 

■  .  . 

X  X  X  X  XX 

X  X  X  X  XX 

i  6 

1 

_ 

Where  x  stands  for  a  potential  nonzero  element.  Note  that  Z  is  a  lower  Hessenberg 
matrix  with  a  special  zero  distribution  on  the  superdiagonal.  Also,  W  is  a  unit  upper  tri¬ 
angular  matrix  with  special  zero  distribution  on  the  superdiagonal. 

Now  we  present  an  algorithm  that  relates  W  of  (2.1)  and  Z  of  (2.3)  to  W  and  Z*  of 
(2.36). 

Algorithm  2.3  :  The  Permutation  Aleorithm 

Let  R  ,  S ,  and  7  be  square  matrices  of  order  m,  where  R  is  the  input  matrix  to  the  algo¬ 
rithm  and  7  is  the  output  matrix.  The  following  algorithm  permutes  the  columns  and 
rows  of  R  such  that: 

(a)  if/?  is  a  LRQ  matrix  then  7  is  a  W  of  (2.36),  and 

(b)  if  R  is  a  ULQ  matrix  then  7  is  a  Z*  of  (2.36). 


1.  Column  Permutation 


2.  Row  Permutation 
For  i =l,2,...,m/2 

7"m-2i+l..  $i,. 

T/n—2i  +2..  *  ^/7i 

Next  t 


An  example  of  the  permutation  algorithm  is  given  below  form  =6. 


Example  2.3  (m=  6) 


■ 

■ 

* 

1 

0 

0 

0 

0 

0 

1 

0 

*3.2 

W3.5 

*3,1 

W3.6 

u'2.1 

1 

0 

0 

0 

WZ6 

0 

I 

w4.2  w4.5 

W41 

^4.6 

U'3,j 

vv'3.2 

1 

0 

w3,5 

W3.6 

,VV  = 

0 

0 

1 

0 

W2.1 

WZ6 

W4.1 

vv4,2 

0 

1 

VV4.5 

W'4.6 

0 

0 

0 

1 

W5.1 

W5.6 

^5.1 

o' 

0 

0 

1 

W5.6 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

*  " 

*  " 

2  1.1  2  1.2  2  1.3  2  1.4  2  1,5  2  1,6 

2  3,3  2  3,4  0  0  0  0 

0  2  2.2  2  2.3  2  2.4  2  2.5  0 

2  4,3  2  4,4  0  0  0  0 

0  0  Z}j  Z34  0  0 

,z  = 

2Z3  2 2,4  2Z2  22,5  0  0 

0  0  Z43  Z44  0  0 

25J  2  3.4  2  5,2  2 5.5  0  0 

0  Z5.2  25.3  25,4  Z5.5  0 

2 1,3  2  1,4  2 1,2  2 1,5  2 1.1  2 1.6 

2  6.1  2  6.2  2  6,3  2  6.4  2  6.5  2  6.6 

2  6,3  2  6,4  2  6.2  2  6,5  2  6.1  2  6.6 

This  clearly  shows  that  the  quadrant  matrices  are  permuted  block  triangular 
matrices  with  blocks  of  size  2.  That  is,  the  Forward  (Backward)  Quadrant  Interlocking 
factorization  is  equivalent  to  a  block  Doolittle  (Crout)  decomposition  with  blocks  of  size 


On  sequential  computers,  a  QIF  is  not  expected  to  be  faster  than  any  triangular 


decomposition.  Since  computing  the  entries  of  the  factors  by  solving  2x2  systems 
requires  more  operations,  as  shown  in  Proposition  2.6.  Also,  finding  a  nonsingular  2x2 
submatrix  is  more  expensive  than  finding  a  nonzero  element.  However,  on  parallel  com¬ 
puters.  the  QIF  is  expected  to  be  competitive,  since  the  number  of  entries  that  can  be 
produced  concurrently  in  every  stage  is  doubled,  and  the  number  of  stages  is  halved  as 
compared  to  a  triangular  factorization  algorithm.  Therefore,  we  may  view  the  column 
permutation  step  in  Algorithms  2.1  and  2.2  searching  for  a  nonsingular  2x2  submatrix  as 
a  computation  decoupling  price  we  pay  for  the  concurrency  gained  in  steps  2-4. 

Determining  the  relationship  between  quadrant  and  triangular  matrices  is  a  key 
observation  that  we  will  use  in  the  following  section  to  design  appropriate  updating 
scheme  for  the  quadrant  interlocking  factors  of  the  basis  matrix  in  the  simplex  method. 


in.  UPDATING  THE  QIF  OF  THE  BASIS 


At  the  beginning  of  a  simplex  iteration,  suppose  the  basis  has  the  form 

B=ZWR,  (3.1) 

where  we  assume  forms  (2.36)  for  Z  and  W ,  and  R  is  a  permutation  matrix.  When  the 
entering  column  A  j  replaces  the  leaving  column  B  at  the  end  of  the  simplex  iteration, 
we  have  a  new  basis  matrix  B  which  is  related  to  the  previous  basis  matrix  B  by  the  for- 


B  =  B  E  (3.2) 

where  E  is  an  eta  matrix  whose  pth  column  is  (B-1  A  j),  and  all  other  columns  are  the 
identity  columns.  From  (3.1)  and  (3.2)  B  can  be  written  as 

B  =  ZW  R  E.  (3.3) 

An  updating  scheme  is  a  sequence  of  operations  applied  to  the  right  side  of  (3.3)  to 
return  it  to  the  form  given  by  (3.1),  i.e. 

B  =  ZWR,  (3.4) 

where  W  ,  Z  are  the  new  Q.I.  factors  and  R  is  a  permutation  matrix.  We  present  an  algo¬ 
rithm  designed  to  derive  (3.4)  given  (3.3).  It  is  similar  to  the  Forrest-Tomlin  [1972-1] 
update  for  the  triangular  factors  of  the  basis.  Since  the  spike  is  in  W ,  our  strategy  is  to 
reduce  the  spiked  W,  i.e..  WE,  to  an  LRQ  matrix  using  elementary  ULQ  matrices.  The 
following  algorithm  exploits  the  triangular  form  of  W  and  the  existence  of  2x2  identity 
blocks  on  the  diagonal  of  W . 

In  this  presentation  we  use  the  term  brother  columns  (rows)  to  indicate  columns 
(rows)  that  have  the  same  potential  nonzero  structure,  excluding  the  diagonal  entries  in 
case  of  LRQ  matrices.  Thus,  for  LRQ  matrices  in  the  form  of  (2.36)  columns  (rows) 


-r  a  r,  / 


i,i  + 1  are  brother  columns  (rows)  for  (=1,3,  ■  *  •  jn-\. 

The  first  step  of  this  scheme  is  a  column  permutation  followed  by  a  row  permuta¬ 
tion.  In  Figure  3.1  an  example  is  presented  to  illustrate  this  step,  in  which  R  of  (3.3)  per¬ 
mutes  columns  2  and  4  of  IV  and  x  stands  for  potentially  nonzero  elements.  Thus,  IV  and 
WR  are  as  illustrated  in  Figure  3.1  (a)  and  (b).  From  (3.3)  we  obtain 

Z-‘  B  =W  R  E 

=  s\ 

where  5  is  illustrated  in  Figure  3.1  (c)  and  y  stands  for  the  elements  of  the  column  vector 
(Z-1  A  j).  Note  that  if  (Z-1  A  j)  has  the  same  zero  structure  as  W  ?,  then  the  new  fac¬ 
tors  are  immediately  available.  That  is,  IV  is  S  and  Z  is  Z.  If  this  is  not  the  case,  we 
place  S  in  a  spiked-VV  form  S  as  shown  in  Figure  3.1  (d),  by  applying  the  column  permu¬ 
tation  R~l  to  5  to  undo  the  effect  of  R .  That  is, 

Z~l  B  R~x  =  IV  R  E  R 
-S  R~l 


Suppose  q  <  m- 1.  We  apply  the  column  permutation  R  to  S ,  placing  the  spike  and  the 
brother  of  the  leaving  column  in  the  positions  m  and  m-1,  respectively,  and  moving  all 
intervening  columns  forward  to  produce  the  matrix  //*?,  as  illustrated  in  Figure  3.1(e). 
We  then  apply  the  row  permutation  R~l  lo  H*  placing  the  q,h  row  and  its  brother  row  in 
positions  m  and  m- 1,  respectively,  moving  all  intervening  rows  two  places  up  to  pro¬ 
duce  the  matrix  H ?  as  shown  in  Figure  3.1  (f),  where 

_  _  )  q,  if  q  is  odd  \ 

Q  ~  |  q-l,  if  q  is  even. 

Note  that  q  is  odd.  Of  course,  if  q  tm-l,  then  R  =  /.  Now  (3.5)  becomes 
R~l  Z_1  B  R~]  R  =R~]  \V  R  E  R~l  R 

-r‘~1  s  r-'r 

=  R-1  s  R 
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Figure  3.1.  Illustration  of  the  double  column  and  row  permutation 

(m=8,p=2, 4=4, 4=3  ). 


1 

l 

m 
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Figure  3.2.  Illustration  of  the  general  form  of  the  matrix  Hl . 
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Consider  the  matrix  Hl  whose  general  form  is  depicted  in  Figure  3.2.  Note  that  the 
matrix  resulting  from  the  above  permutation  is  Hl  when  l  =q.  Note  also  that  all  non- 
W-elements  in  H‘  are  in  the  last  two  rows  in  columns  l  through  m .  Our  objective  now  is 
to  reduce  Hi  to  a  LRQ  matrix  by  eliminating  these  non-W-e!ements.  We  consider  elim¬ 
inating  them  four  at  a  time  using  the  2x2  identities  on  the  diagonal  of  Hl .  The  necessary 
matrices  that  should  reduce  Hl  to  Hl+2,  for  l=q,q+2,  •  •  •  ,m-3,  are  the  following  EULQ 
transformations. 


/-I  / 


By  repetitive  application  of  Zl  to  Hl ,  for  /  =q,q+  2,  •  •  •  ,m- 3,  we  get  Hm~}  which,  in 
general,  has  a  non-W-elemem  in  its  m-\jn  entry  and  a  nonconforming  element  in  the 
in  ,m!h  entry.  Therefore,  the  following  rank -one  elementary  transformation  is  sufficient  to 
reduce  Hm~]  to  the  LRQ  matrix  W , 
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»« 

I* 


■ 


— 1  t  rj n  —\ 

~nm-\,m  '  nm,m 


Theoretically,  «m,m  is  a  nonzero  element,  since  otherwise  B  is  singular.  Now,  combin¬ 
ing  ail  transformations  applied  to  //? ,  we  obtain, 

Zm-'  Zm~~  ■  Z*  H*=W, 

and  (3.6)  becomes, 

{Zm-\  zm~-  ■■■  ZiR-'Z-'JB  {R-lR}=Zm~1  Z'""3  •••  Z«  H?  (3.7) 


/z-1;  B  { R~x }  =  W , 


which  is  equivalent  to  the  required  updated  form  (3.4),  with 
Z  =Z  R  Z<~’  ■■■  Zm-y'Zm~v\ 


R  -  R~]  R  ,  and 


H'  =  Zm"1  Zm“3  •  •  ■  Z?  //«". 


Note  that  Z  in  (3.8)  is  not  a  ULQ  matrix,  even  though  all  its  factors,  except  the  permuta¬ 
tion  matrix  R ,  are  ULQ  matrices.  In  practice,  Z-1  is  stored  factorized  as  in  the  first 
braced  term  in  the  left  hand  side  of  (3.7). 


Using  the  above,  the  updating  algorithm  may  be  stated  as  follows: 
Algorithm  3.1  :  The  B.O.l.F  Updaiini  Algorithm 

0.  Begin  with  the  m  x  m  matrix  B  =  Z  W  R  ,  and  suppose  column  p  of 
B  is  replaced  by  A  ; . 

1 .  Define  q  such  that  R  „  =eq. 


m\ 


i; 


( 


i 


3.  Let 


4.  Set 


5.  H  <^R~‘  S  R. 


6.  Let 


For  !~q.q~ 2.  •  •  •  ,m-3. 


7.  Set 


otherwise. 


^  Wj,oth 


<7+1,  if  q  is  odd ; 
<7~L  if  <7  w  even. 


,  1  <  i  <  <7 ; 
e,+2.  <7  <  m-1; 

e?-,/  =  m -1; 
ea ,  i  =  m. 


_  _  J  q ,  if  <7  is  odd ; 

^  “  1  <7-1,  if  <7  jj  even. 


1,/=/; 

-Hm.h  i-m  \ 
0,  otherwise. 


l,/=/+l; 

hi  tt\  —  1 ,  /  *  l ,  < ~rn  “ ~  1 , 

hi  m  j + 1 » ^  , 

0,  otherwise. 


Z  j  *-ej,j*l  and /*/+!. 


»ijjy 

5!' 


8 .H  <-Z'  H. 


Next  l . 


9.  Set 


H/n  - 1  .m  jn  i  t  —tTl  1 1 

1  yn  i  t  — ^  i 

0,  otherwise. 


Z”-‘  *—  Cj,  j*m. 


10.  H  <r-Z 1  //. 

11.  Set 

b  ={z  r  (z?)-1  •  •  •  (Z'"-1)-1;  #  f/r1  /? ; 


=  Z  li'  /?. 

This  updating  scheme  inherits  the  major  characteristics  of  the  Forrest-Tomlin 
update  for  the  triangular  factors  of  the  basis.  First,  no  new  nonzeros  are  created  in  the 
right  factor  U\  since  only  deletions  of  items  are  required.  Therefore,  sparsity  of  W  is 
preserved  and  fill-in  is  minimized.  Second,  the  lack  of  choice  of  the  pivot  elements 
makes  this  update  less  numerically  stable  than  the  Banels-Golub-based  updates.  Thus, 
there  is  a  gain  in  speed  and  storage  at  some  sacrifice  in  numerical  stability. 


IV.  parallel  implementation 


In  this  section  we  describe  a  parallel  implementation  of  two  basic  tasks  of  any  sim¬ 
plex  based  linear  programming  code,  namely,  basis  reinversion  and  solution  of  the  linear 
systems.  A  parailei  version  of  the  Backward  Quadrant  Interlocking  Factorization  Algo¬ 
rithm  (BQIF)  is  presented  in  Section  4.1.  Only  the  left  factor  is  produced  in  its  product 
form  while  the  right  factor  is  produced  in  its  explicit  form.  This  form  conforms  with  the 
updating  scheme  of  Section  III.  In  this  algorithm,  parallelism  is  gained  by  reformulating 
the  BQIF  Algorithm  in  terms  of  high-level  modules  such  as  matrix-vector  operations. 
These  modules  represent  a  high  level  of  granularity  in  the  algorithm  in  the  sense  that  they 
are  based  on  matrix-vector  operations,  0(m 2)  work,  not  just  vector  operations,  0{m) 
work.  The  module  concept  has  already  proven  to  be  very  successful  in  achieving  both 
transportability  and  high  performance  of  some  linear  algebra  routines  across  a  wide  range 
of  architectures,  as  reported  by  Dongarra  and  Sorensen  [1984-2]  and  Dongarra  and 
Hew'itt  [1986-1], 

Giver,  a  basic  feasible  solution  with  basis  B,  each  iteration  of  Dantzig’s  simplex 
algorithm  involves  solving  the  systems  of  equations  nB-cD  and  B  y -A  j.  An 
efficient  parallelization  of  the  simplex  algorithm  requires  efficient  parallel  algorithms  for 
solution  of  these  systems.  Parallel  algorithms  for  solving  these  linear  systems  using  the 
quadrant  factors  are  presented  in  Section  4.2.  The  parallel  implementation  discussed  in 
this  section  is  proposed  for  an  MIMD  parallel  computer  that  incorporates  p  identical  pro¬ 
cessors  sharing  a  common  memory'  and  capable  of  multitasking,  that  is,  the  processors 
are  capable  of  applying  all  their  power  to  a  single  job  in  a  timely  and  coordinated 
manner. 

4.1  The  Module-Based  BQIF  Algorithm 
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Given  an  m  x  m  matrix  B ,  the  algorithm  either  indicates  singularity  of  B  or  pro¬ 
duces 

Zm_iZm-3...zi5  R  =Wf  (4.i} 

where  R  is  a  permutation  matrix,  Zk  is  a  rank-2  matrix  of  the  form, 


Zk  = 


k  k+l 


k 

k+l 


(4.2) 


This  form  conforms  with  the  updating  schemes  of  Section  III.  Its  LU  version  has  been 
used  in  several  LP  codes  (Reid  [1982-1])  .  At  every  stage  a  new  Z'  is  produced  and  two 
rows  of  \V  are  updated.  The  availability  of  the  updated  rows  of  W  at  every  stage  allows 
for  parallel  implementation  when  searching  for  a  nonsingular  2x2  submatrix.  Moreover, 
it  facilitates  finding  the  2x2  submatrix  of  largest  determinant  rather  than  finding  one  with 
a  nonzero  determinant.  This  reduces  the  rounding  error  in  the  factorization  process  and 
hence  improves  the  numerical  accuracy  of  the  results  (Shanehchi  and  Evans  [1982-1]). 

The  major  pan  of  the  algorithm  is  formulated  in  terms  of  three  basic  modules: 

Module  1 :  Search  for  a  nonsingular  2x2  submatrix 
Input  :  A  zRln 

Purpose  :  Find  column  indices  j\  and  ji  such  that 

DET  —  A  i _/ ]  •  A  ij 2  -  A  ij j  .  A  \j%  *  0. 

Output  :  Ji,  jz,  and  DET  or  a  singular  indication. 
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Module  2  :  Matrix  -  2  vectors  product 
Input  :  v1  e/?"1,2,  A  1  zRn',n\x:  tR"1,2. 

Purpose  :  Compute  y 1  such  that  y 1  <—  y 1  +  A  1  x 


Output  :  y 


Module  3  :  2  vectors  -  matrix  product 


Input  :  v 2  zR  2,/:,  x2  eR  2,/\  A  2  tR 

Purpose  :  Compute y2 such  that y 2  <—  y 2  +  x2  A2. 


Output  :y2. 


These  modules  represent  a  high  level  of  granularity  in  the  algorithm  in  the  sense 


that  they  are  based  on  matrix-vector  operations,  0(m2)  work,  not  just  vector  operations, 


0  (m )  work. 


Algorithm  4 .]  ;  Reinversion 


Let  BeRmm.  Then  the  following  steps  produce  a  singular  indication  if  B  is  singular,  or 


decompose  B  as  in  (4.1)  if  B  is  nonsingular.  The  column  indices  are  stored  in  IP  IT  (m ). 


Define  the  2x2  submatrix  S,  j(A )  to  be 


c  (A  \  A,'J  A,J+l 
Al+W  A1+i,;>1 


0.  Initialize. 


1:2,1  :m 


For  i  =1,3,  ■  ■  ■  jn~ 3 


1.  Find  a  nonsingular  2x2  submatrix. 


Set  n  «—  m  -  i  +  1  and  A  <— 


Call  Module  1  (A  ,n ). 


If  A  is  singular,  then  terminate  with  B  singular, 


;• .  .  ;■  -  v  •,  •/  v  v  v  v  v.v.v. /,  /,  .--aa'Nj.n 


» 


otherwise,  permute  columns 

with  H'i .mj.  and  Wlm  with  \Vl:m  ;i. 
Record  permutation,  JP\T (i)=j  j,  1PVT ( /  +\)=j2 . 

2.  Obtain  a  new  Z‘ . 

Z‘  4—  /,  where  /  is  m  x  m,  S,j(Z‘)  [£,.,  ( W)]_1. 

n  i  «—  m-i -1,  nj  «—  i-l. 

For  /  =  1,3,  •  ■  • ,i  -2 
,  l 

A  .J;l+\  1:1+1  ■ 


Next  /. 

Call  Module  2  (y3.r3,4  3 ,«!,/» 2) 


►2.m  > 


3.  Update  rows  /  +2.  /  +3  of  H' . 

/ 1  4—  t+1.  /:  *—  m  -i  -M. 

A  *  <—  W  l;i  Bi-rl.i+'i.i+l.n 

For  /  =  1,3.  •  ■  • 


*  */:/  +  !  tu'-i- 


Next  / . 


Call  Module  3  (v2,xM:,/i,/:) 

»I  r  , 2 

1  *2:i >  • 


Next  t . 


4.  Update  H'. 

For  /  =1.3.  ■  ■  ■  jn- 3 

5, .,  (IF ) «-  / .  where  /  is  2  x  2. 

^  i  :j *  1 ,1  +2:m  *  St  i(Z  )  :i  +  l.i**-2:n 


The  general  approach  we  propose  for  parallel  implementation  involves  having  the 
parent  processor  prepare  the  parameters  for  a  module  and  make  use  of  the  kids  (subtask 
processors)  to  work  concurrently  on  that  module.  In  Module  1,  at  most  n  (n- 1)/2  column 
pairs  should  be  checked.  The  parent  sends  to  each  kid  the  column  indices  to  be  checked 
for  nonsingularity,  and  stops  all  kids  whenever  one  succeeds.  As  mentioned  before,  it  is 
possible  to  find  the  nonsingular  2x2  submatrix  of  largest  determinant.  To  do  this,  the 
parent  sends  the  column  indices  to  the  kids,  each  kid  finds  the  column  pair  of  largest 
determinant  in  his  list,  sends  them  to  the  parent,  then  the  parent  selects  the  best  by  com¬ 
paring  only  p-\  values. 

The  concurrency  in  Modules  2  and  3  is  obvious  since  they  involve  matrix-vector 
operations.  In  Module  2  (matrix  -  2  vectors  product)  parallelism  is  obtained  by  perform¬ 
ing  In  j  independent  inner  products,  where  a  j  is  the  row  dimension  of  the  matrix.  Simi¬ 
larly.  in  Module  3  (2  vectors  -  matrix  product)  concurrency  is  gained  by  executing  2/; 
independent  inner  products,  where  /;  is  number  of  columns  of  the  matrix.  Step  3  needs 
on!;.’  from  Step  2.  These  are  the  first  two  row's  of  y1.  Thus,  as  soon  as  these 

elements  become  available  Step  3  may  proceed.  This  can  easily  be  synchronized.  Finally, 
in  Step  4  the  loop  divides  over  i  with  completely  independent  tasks.  However,  the  tasks 
require  different  amounts  of  computation.  Two  solutions  are  possible.  Either  we  adopt 
dynamic  task  queue  allocation,  or  we  statically  allocate  i=l,m-3  to  one  processor, 
i  =3,  m  -5  to  the  second,  and  so  on. 

4.2  Solving  the  Linear  Systems 

In  this  section  we  investigate  the  possible  parallelism  involved  when  we  solve  the 
systems  of  equations  xB  =  cB  and  B  y  We  assume  that  the  basis  matrix  B  is  in 


Zm~\  z*-3  ■  ■  ■Z'B  R  =  W, 


where  Zk  has  the  form  (4.2),  and  W  is  a  block  unit  upper  triangular  matrix  with  blocks  of 


size  2.that  is  it  has  the  form  (2.36).  We  compute  the  dual  variables  (rt)  using  the  follow¬ 


ing  steps: 


(1)  Permutation  :  k  =  cbR. 


(2)  Solve  a  block  triangular  system  :  tilF  =  ;t. 

(3)  BTRAN  :  7t  =  nZm-'  Zm~^  ■  •  •  ZK 


We  compute  >\  the  basis  representation  of  the  incoming  column  A  as  follows: 

(1)  FTRAN  :  v  =  Zm_1  Zm_3  •  •  Zl  Aj. 


(2)  Solve  a  block  triangular  system  :  W  y  =  y. 


(3)  Permutation  :y=R  y. 


W'e  present  parallel  implementations  of  the  FTRAN  operation,  the  solution  of  a 
block  triangular  system,  and  the  BTRAN  operation  in  Sections  4.2.1,  4.2.2,  and  4.2.3, 


respectively. 


4.2.1  The  FTRAN  in  Parallel 


The  rules  for  applying  a  Z*  to  an  arbitrary  vector  v  are  as  follows: 

a)  Extract  ak  *~vk,  and  a*+1  <-  v*+i. 


b)  Set  v*  «-  0,  and  vi+1  0. 


c)  Compute  v  =  v  +  a*  Z^.*  +  o*+1  Z£m,i+i . 

Note  that  if  v*  =  vi+[  =  0,  then  v  =  v  and  no  element  of  v  will  change. 


An  example  is  now  given  for  m  -  6,  k  =  3.  Suppose  we  have 


■..3:4  = 


0  0 
0  0 
2  1 
1  2 
1/3  1/4 
1/6  1/2 


and  u  = 


Then  the  computation  of  Z3v  is  given  by 


and  the  computation  of  Z3u  is  given  by 


These  rules  are  implemented  in  the  following  module: 

Module  :  FTRAN  Operation  ( A  ,v,n) 

Purpose  :  Apply  Zk  to  an  arbitrary  vector  v. 

Input  :  n,  A  eRn’Z,v  zRnA. 

Output  :  v,  where  v  =Zk  v. 

Steps  :  1.  Extract  aj  «-  v1(  and  a2  <—  V2. 

2.  Set  v  i  <—  0,  and  v2  <—  0. 

3.  Compute  A  j  *-  oti  A  j. 

4.  Compute  A  ,2  <—  Ct2  A  ,2- 

5.  Compute  v  <—  v  +  A  j  +  A  t2. 

Obviously,  steps  3  and  4  are  independent  and  can  be  executed  in  parallel.  In  step  5, 
the  work  is  partitioned  over  the  rows  of  v ,  assigning  each  kid  a  block  of  rows  to  evaluate. 
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4.2.2  Solving  the  Block  Triangular  System 


The  solution  of  an  m  x  m  triangular  system  of  equations  on  a  sequential  computer 
can  be  obtained  by  either  a  forward  or  backward  substitution  process  which  requires 
O  (m:)  steps,  each  defined  as  one  multiplication  followed  by  one  addition.  In  order  to 
solve  the  system  on  a  parallel  computer,  methods  which  require  0(m3)  processors  and, 
hence,  reduce  the  computation  time  to  0(log:m)  have  been  developed  (  e.g.  Chen  and 
Kuck  [1975-1]  and  Sameh  and  Brent  [1977-1]  ).  Evans  and  Dunbar  [1983-1]  introduced 
methods  that  run  in  O  (m )  time  using  0  ( m )  processors.  For  practical  purposes  the  pro¬ 
cessor  and  storage  requirement  of  these  methods  is  unreasonably  large. 

In  this  subsection  we  consider  solving  the  linear  system 

atlV  =  b ,  (4.4) 

where  x,  b  tRm  and  \V  is  an  upper  triangular  m  x  m  matrix  with  2x2  identity  diagonal 
blocks.  This  system  may  be  solved  by  a  forward  substitution  (FS)  process  described  in 
algorithmic  form  as  follows. 

For  i  =  1,2.  •  •  •  ,m 

i-i 

*<■  =£>.--  I  wi.j  Xj. 

/=’* 

Next  i . 

It  is  obvious  that  a  uniprocessor  will  solve  (4.4)  sequentially  in  m(m-2)/2  steps  by  the 
FS  process.  Let  Tp  denote  the  time  required  to  solve  (4.4)  using  p  processors,  where  one 
step  requires  one  unit  of  time.  Then 

T i  =  m  ( m  -2)12. 

With  a  parallel  computer  that  has  p  processors,  a  minimum  time  requirement  for  the 
solution  of  (4.4)  is 

min  {Tp  )-T\!  p  -  m  ( m  -  2  )  /  (  2  p  ).  (4.5) 
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The  minimum  completion  time  of  any  algorithm  based  on  FS  is  equal  to  the  number  of 
terms  in  the  expression  that  evaluates  xm,  that  is 

T  min  —  rn  —2. 

Front  (4.5)  it  is  clear  that  a  minimum  of  m  12  processors  is  necessary  to  solve  (4.4)  in  the 
minimum  time  of  m-2  operations.  Again  this  processor  requirement  is  unreasonably 
large  for  our  application. 

The  machine  we  consider  has  a  limited  number  of  identical  processors  ( p  <30). 
Therefore,  we  consider  the  question:  if  we  arc  given  a  fixed  number  of  processors,  how 
should  the  parallel  operations  be  scheduled  on  the  processors  to  minimize  the  solution 
time  of  (4.4)?  We  propose  to  answer  this  question  using  a  directed  graph  model  that 
represents  the  FS  process  as  follows.  The  nodes  of  the  graph  represent  tasks  of  equal  exe¬ 
cution  time  and  the  edges  represent  the  precedence  relationships  between  the  tasks.  Then 
we  apply  a  simple  scheduling  algorithm  due  to  Hu  [1961-1],  called  the  level  algorithm,  to 
schedule  the  tasks  on  the  processors  such  that  the  total  execution  time  is  minimized.  This 
algorithm  is  known  to  be  optimum  for  a  tree  graph,  and  it  gives  extremely  good  results 
for  general  graphs  as  reported  by  Ramamoonthy  et  al  [1972-1],  Huang  and  Wing  [1979- 
1].  and  Wing  and  Huang  [1980-1]. 

We  first  organize  the  FS  process  in  terms  of  operations  of  equal  time  and  define  the 
corresponding  directed  graph.  Let  x‘  =[;q  ,Xj+i].  Partition  x,  b,  and  W  into  blocks  of 
size  2.  Using  51>;  as  defined  in  (4.3),  the  above  FS  process  can  then  be  written  as 


For  i  =  1,3,  ■  •  •  ,m-l 


x‘  -bl 
Next  i . 


xi  SijiW). 

,i-2 


Let  the  following  operation,  where  x‘  is  used  to  update  x> ,  define  a  task 

xJ  ^xJ -x‘ (4.5) 
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For  Hu's  algorithm  we  assume  that  the  execution  time  of  an  operation  (4.5)  is  one  unit  (4 
multiplications  and  4  additions).  We  can  see  that  the  FS  process  consists  of  a  set  of 
operations  (4.5),  on  which  a  set  of  precedence  relations  exists.  That  is,  to  complete  the 
evaluation  of  x‘  we  require  x‘~ 2,  for  /  =  3,5,  •  •  •  jn- 1.  The  process  can  therefore  be 
represented  by  a  directed  graph  G(V  ,E)  where  the  vertex  set  V  is  defined  as 

Vs/Vj  j  I  Vij  represents  an  operation  ( 4.6 )}, 
and  the  edge  set  E  is  defined  as 

Es{ (v,  j  ,vkj)  \ operation  vkj  requires  the  direct  result  of  operation  v,- ,J. 

We  shall  call  G  (V  ,E)  the  forward  substitution  task  graph,  and  refer  to  it  by  FSTG. 
In  Figure  4.1  the  FSTG  for  m  =10  is  presented.  For  every  v)>;  in  the  FSTG,  the  pair  i,j  is 
indicated.  A  node  is  an  initial  node  if  it  does  not  have  a  predecessor  and  is  a  terminal 
node  if  it  has  no  successor.  It  is  clear  that  the  FSTG  has  only  one  terminal  node,  at  which 
i  =  m  -3  and  j  =  m  -1.  Accordingly,  the  minimum  completion  time,  denoted  by  D ,  of  the 
FSTG  is  equal  to  the  number  of  nodes  on  the  longest  path  from  an  initial  node  to  the  ter¬ 
minal  node.  Thus,  D  =  (m!2)  -  1,  which  is  the  number  of  times  operation  (4.6)  is  exe¬ 
cuted  for.rm~'. 

We  next  determine  the  levels  of  the  vertices  of  the  FSTG.  Define  the  level  number 
( lij )  of  a  node  Vj j  as  follows:  1)  the  level  of  the  terminal  node  is  D ,  2)  the  level  of  a 
node  that  has  one  or  more  successors  is  equal  to  the  minimum  of  the  levels  of  its  succes¬ 
sors  minus  one.  Applying  this  definition  to  the  FSTG,  we  can  conclude  that 

lij  *  ( i  +  1 )  /  2.  (4.6) 

The  level  number  is  simply  the  latest  time  by  which  node  vtj  must  be  processed  in  order 
to  complete  the  task  graph  in  the  minimum  time  D .  The  level  numbers  of  the  nodes  of 
Figure  4.1  are  given  as  shown. 

Once  the  level  numbers  of  the  operations  are  determined,  we  apply  Hu’s  scheduling 


The  FSTG  for 


algorithm  to  assign  operations  to  processor.;.  Define  a  ready  task  to  be  one  whose 
immediate  predecessors  have  all  been  processed.  The  scheduling  algorithm  is  as  follows. 


Algorithm  4.2:  Hu's  Scheduling  Algorithm 


1.  Among  all  the  ready  tasks,  schedule  the  one  with  smallest  level  number. 


2.  If  there  is  a  tie.  schedule  the  one  with  the  largest  number  of  immediate  succes- 


Applving  this  Algorithm  to  the  FS  process  represented  by  FSTG,  the  computations 


are  organized  as  follows. 


Algorithm  4.3  :  Forward  Substitution 


Set  X1  <-bK 


Font  =  3,5.  ,m -l 


xk  Su( W). 


Next  k . 


For  i  =  3,  •  •  •  jn-7> 


For  j  =  1+2./+4,  ■  ■  yn-1 


x>  <-.t ■'  -x‘  Si'j(W). 


Next  j . 


Next  i . 


All  operations  in  loop  k  are  independent  and  have  the  same  level  number.  Their 


level  number  (luk  =  1)  is  the  smallest  among  all  other  operations  in  the  Algorithm,  and 


hence  they  are  executed  first.  Similarly,  all  operations  in  loop  j  are  independent  and  have 


the  same  level  number  as  given  by  (4.6).  The  ordering  of  index  i  predicates  the  execution 


of  the  operations  by  increasing  level  number.  This  satisfies  the  first  criterion  in  Hu’s 


Algorithm.  The  second  criterion  imposes  the  ordering  of  the  index  j .  That  is,  the  number 


of  immediate  successors  of  v1>;  is  always  greater  than  or  equal  to  that  of  v(>;+2  for 


mmm 


VV. 


j  =  J+2./+4,  ■  ■  ■  ,m- 1. 


A  parallel  implementation  of  Algorithm  4.3  involves  having  the  parent  processor 
partition  the  work  in  loop  k  among  the  kids.  Then  for  every  i,  the  computational  tasks  of 
loop  j  are  again  divided  among  the  kids. 


Lower  bounds  on  the  completion  time  of  a  task  graph  given  a  fixed  number  of  pro¬ 
cessors  were  derived  by  Ramamoorthy  et  al.  [1972-1].  Let  n*  be  the  number  of  nodes  in 
level  k .  Let  :*  (p )  be  the  minimum  completion  time  to  process  a  task  graph  with  p  pro¬ 
cessors.  Then 


r*  (p)>  max 


J —  +  D  -  i 
P 


(4.7) 


where  D  is  the  minimum  completion  time  of  the  task  graph  and  [x]  denotes  the  smallest 
integer  >.r .  The  first  term  in  the  expression  denotes  the  minimum  number  of  time  units 
required  to  complete  all  the  operations  of  the  first  i  levels  using  p  processors.  The  term 
D  -i  is  equal  to  the  number  of  remaining  levels  yet  to  be  processed.  This  bound  may  be 
useful  in  demonstrating  optimality  of  the  scheduling  using  Hu’s  Algorithm. 


4.2.3  Parallel  Implementation  of  the  BTRAN  Operation 

In  this  section,  we  consider  the  parallel  implementation  of  the  following  operation 

k  =  7 izm-!  Zm~ 3  •  Z\ 

where  ~  is  an  arbitrary  vector  of  m  elements  and  each  Z*  is  an  m  x  m  rank-2  matrix  that 
has  the  form  (4.3). 

The  rule  for  computing  u  =  u  Zk  is  as  follows: 
al  Set  (7,  <—  u ,  for  i*k  and  i*k  + 1. 

b)  Set  iik  <-  uk Z^,*. 

c)  Set  /7t*i  <—  uk-m  Z^;m,i+1 . 
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For  example,  let  m  =  6.  k  -  3  and  suppose  we  have 


0  0 

z\:4  =  12  .and  «  =[  1  1  1  1  1  1  ]. 

3  4 
6  2 

Then  «su2**[l  1  12  9  1  1  ]. 

Note  that  i~  differs  from  u  in  only  the  k'h  and  the  k+la  elements.  Note  also  that  the 
elements  u,,  i= 1,  •  •  •  .Jk  — 1.  are  not  required  in  computing  u.  Using  these  observauons, 
the  BTRAN  process  may  be  represented  by  the  following. 


For  k  -  m  - 1 ,  •  •  ■ ,  1 
Uk  *-  Uk:m  Z£-.m.k- 

<  Ukm  ^K:m,k~ \  ■ 


Next k . 


We  now  apply  the  methodology  stated  at  the  end  of  the  previous  subsection.  Let  the  fol¬ 


lowing  operations  define  a  task 


uk  uk  Sk,k(Zk). 


u>  <—  uJ  +  u‘  Sij(ZJ).  (4.9) 

We  assume  that  the  execution  urne  of  both  operarions  is  one  unit.  The  task  graph 
G  (V  ,E )  of  the  BTRAN  process  is  defined  by  the  vertex  set  V ,  where 


an  operation  (4.8),  if  i-j\ 
an  operation  (4.9),  otherwise 


V  =•{  vi  ;  !  v,  j  represents 


and  the  edge  set  £,  where 

£  =f(vi;  ,  vk!)  I  operation  vkj  requires  the  direct  result  of  operation  vl;j. 


MV' 


C  0',£ )  has  only  one  terminal  node  at  which  i  =  3  and  j  =  1.  Following  the  same  argu¬ 
ments  used  earlier  with  FSTG,  we  conclude  that 


£>  =m  / 2, 


ana 


l,  if  i 

(  m  -i  +  3  )  /  2,  otherwise. 


Applying  Hu's  Algorithm  to  the  BTRAN  task  graph  yields  the  following  ordering 
of  computations. 

Algorithm  4.4  :  BTRAS  Operation 
For  k  =  m  -1  ,m  -3,  ■  •  •  ,1 
uk  uk  Skik(Zk). 


Next  k . 


For  i  =  m  -l.m  -3,  •  •  •  ,3 
For  j  =  i  •  •  •  ,1 

UJ  <-  UJ  +  u‘ 

Next  j . 

Next  i . 


The  ordering  of  the  index  i  is  imposed  by  the  first  criterion  of  Hu’s  Algorithm.  The 
ordering  of  the  indices  k  and  j  is  the  result  of  applying  the  second  criterion.  Parallelism 
is  gained  by  having  the  kid  processors  work  first  on  loop  k  in  parallel,  and  then  for  every 
i ,  having  the  kid  processors  work  on  loop  j  in  parallel. 


V.  SUMMARY 


Evans  and  Hatzopoulos  [1979-1]  developed  a  new  matrix  factorization,  known  as 
the  Quadrant  Interlocking  Factorization  (QIF),  for  solving  linear  systems  on  parallel 
computers.  In  this  paper  we  have  presented  the  algorithms  required  to  use  this  new  fac¬ 
torization  in  Dantzig’s  simplex  algorithm  for  linear  programming.  This  work  may  be 
viewed  as  a  parallelization  of  the  simplex  method  using  a  quadrant  interlocking  factori¬ 
zation  for  the  basis  inverse. 

In  Section  II,  the  factorization  algorithms  are  developed,  and  the  relationship  of 
quadrant  and  triangular  matrices  is  presented.  In  Section  III,  a  new  algorithm  is  presented 
for  updating  the  factorization  during  a  basis  exchange  step.  In  Section  IV,  we  present  a 
parallel  implementation  of  the  factorization  algorithm,  and  develop  the  algorithms 
required  to  solve  the  linear  systems  of  the  simplex  method  on  a  parallel  computer  using 
the  QIF  of  the  basis.  For  each  algorithm  the  concurrency  among  the  steps  is  revealed,  the 
computations  are  organized  and  a  parallel  implementation  is  proposed.  The  algorithms 
are  designed  for  an  MIMD  parallel  computer  that  incorporates  p  identical  processors 
sharing  a  common  memory  and  capable  of  applying  all  their  power  to  a  single  applica¬ 
tion  in  a  timelv  and  coordinated  manner. 
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PREFACE 


The  objective  of  this  investigation  is  to  computationally  test  parallel 


algorithms  for  finding  minimal  spanning  trees.  Computational  tests  were  run  on 


a  single  processor  using  Prim's,  Kruskal's  and  Boruvka's  algorithms.  Our 


implementation  of  Prim's  algorithm  is  superior  for  high  density  graphs,  while 


our  Implementation  of  Boruvka's  algorithm  is  best  for  sparse  graphs.  Implemen¬ 


tations  of  parallel  versions  of  both  Prim's  and  Boruvka's  algorithms  were 


tested  on  a  tventy-cpu  Balance  21000.  For  the  environment  in  which  a  minimum 


spanning  tree  problem  is  a  subproblera  within  another  algorithm,  the  parallel 


implementation  of  Boruvka's  algorithm  produced  speedups  of  three  and  five  on 


five  and  ten  processors,  respectively;  while  the  parallel  implementation  of 


Prim's  algorithm  produced  speedups  of  three  and  five  on  five  and  ten 


processors,  respectively.  The  one-time  overhead  for  process  creation  negates 


most,  if  not  all  of  the  benefits  for  solving  a  single  minimum  spanning  tree 


subproblera. 
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I.  INTRODUCTION 


The  United  States  along  with  other  developed  countries  is  entering  a  new 
generation  of  computing  that  will  require  software  engineers  to  redesign  and 
reevaluate  standard  algorithms  for  the  new  parallel  processing  hardware  that  is 
being  installed  throughout  the  developed  world.  It  may  well  be  that  algorithms 
which  proved  to  be  superior  for  single  processor  machines  may  prove  to  be 
inferior  in  some  of  the  new  parallel  processing  environments.  One  of  the  more 
popular  new  parallel  machines  is  Sequent  Computer  Systems'  Balance  21000.  The 
objective  of  this  investigation  is  to  computationally  test  parallel  algorithms 
for  finding  minimal  spanning  trees  on  a  twenty-cpu  Balance  21000. 

An  undirected  graph  G  «  [ V , E ]  consists  of  a  vertex  set  V  and  an  edge  set 
E.  Without  loss  of  generality  we  assume  that  the  edges  are  distinct.  If  G*  ■ 
[V',E']  is  a  subgraph  of  G  with  V'  ■  V,  then  G'  is  called  a  spanning  subgraph 
for  G.  If,  in  addition,  G'  is  a  tree,  then  G'  is  called  a  spanning  tree  for  G. 
A  graph  whose  components  are  trees  is  called  a  forest,  and  a  spanning  subgraph 
for  G,  which  is  also  a  forest,  is  called  a  spanning  forest  for  G.  We  will  call 
{[Vi.Ti];  -  (uj,  T^  •{,  t  V)  the  trivial  spanning  forest  for  G  and  the 

[Vi.Ti]  trivial  trees.  Associated  with  each  edge  (u,v)  is  a  real-valued  cost 
c(u,v).  The  minimum  spanning  tree  problem  may  be  stated  as  follows:  Given  a 
connected  undirected  graph  each  of  whose  edges  has  a  real-valued  cost,  find  a 
spanning  tree  of  the  graph  whose  total  edge  cost  is  minimum. 

Applications  include  the  design  of  a  distribution  network  in  which  the 
nodes  represent  cities  or  towns  and  the  edges  represent  electrical  power  lines, 
water  lines,  natural  gas  lines,  communication  links,  etc.  The  objective  is  to 
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design  a  network  wMch  uses  the  least  length  of  cable  or  pipe.  The  minimum 
spanning  tree  problen  is  also  used  as  a  subproblem  for  algorithms  for  the 
travelling  salesman  problen  (see  Held  and  Karp  [6,  7]  and  Ali  and  Kennington 
[3]).  Some  vehicle  routing  algorithms  require  the  solution  of  a  travelling 
salesman  problem  on  a  subset  of  nodes.  Hence,  a  wide  variety  of  applications 
require  the  solution  of  minimal  spanning  trees.  Some  applications  require  a 
single  solution  and  some  use  the  model  as  a  subproblem  within  another 
algorithm. 
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II.  THREE  CLASSICAL  ALGORITHMS 


The  algorithms  in  current  use  may  be  traced  to  ideas  developed  by  Prim, 
Kruskal,  and  Boruvka.  These  three  classical  algorithms  all  begin  with  the 
trivial  spanning  forest  Gq  ■  { [ .T^ 3 »  i  ■  0,...,  |  V  |-1 }.  A  sequence  of 
spanning  forests  is  obtained  by  merging  spanning  forest  components.  Given 
spanning  forest  G^,  a  nonforest  edge  (u,v)  is  selected  and  the  components 

and  [V j,Tj]  with  u  t  and  v  £  Vj  are  removed  from  G^  and  replaced  by 

[V£,T£],  where  £  *  k  +  |  V  | ,  V  £  andTj-^UTjU  {(u.v)),  yielding 

spanning  forest  After  m  ■  |  V  | — 1  edges  have  been  selected,  Gm  » 

{[V2m,T2m])  ■  {[V,T]}  is  a  minimal  spanning  tree  for  G. 

Let  [V^.TjJ  and  [vj>Tj]  denote  two  disjoint  subtrees  of  G.  Define  d^j, 
the  shortest  distance  between  the  trees,  by  d^j  *  min  (c(u,v):  (u.v)  e  E,  u  e 
^i*  v  e  Vj)*  The  three  classical  algorithms  may  be  viewed  as  different 
applications  of  the  following  result: 

Proposition  1. 

Let  Vq,  Vl . Vn  denote  vertex  sets  of  disjoint  subtrees  of  a  minimum 


spanning  tree  for  G.  Let  c(u,v)  -  dK_  -  min  d,-_  with  (u.v)  e  V  .  x  V„.  Then 

J‘*  j^n  J  n 

(u,v)  is  an  edge  in  a  minimal  spanning  tree  for  G. 


A  proof  of  Proposition  1  may  be  found  in  Christofides  [4,  pp.  135-136]. 


In  Prim's  algorithm,  the  nonforest  edge  (u,v)  for  G^  is  always  selected  so 
that  (u.v)  e  Vi  x  Vj*,  where  j*  is  the  largest  index  j  such  that  [Vj.Tj]  c  G^. 
Thus  a  single  component  continues  to  grow  as  trivial  trees  disappear.  An  ex¬ 


cellent  description  of  Prim’s  algorithm  is  given  in  Papadimitriou  and  Steiglitz 
[15,  p.  273],  along  with  its  (serial)  computational  complexity  of  0( [ V | ^).  It 


is  believed  that  this  algorithm  is  best  suited  for  dense  graphs. 
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In  Boruvka's  algorithm,  the  nonforest  edge  (u,v)  for  is  always  selected 
so  that  (u,v)  c  Vi#  x  V y  where  i*  is  the  smallest  index  i  such  that  [ V i .T^ ]  c 
Gk.  Thus  a  variety  of  dif ferent-sized  components  may  be  produced  as  the 
algorithm  proceeds.  All  trivial  trees  will  be  removed  first  in  the  early 
stages  of  this  algorithm.  A  description  of  Boruvka’s  algorithm  is  given  in 
Papadimitriou  and  Steiglitz  [15,  p.  277],  along  with  its  (serial)  computa¬ 
tional  complexity  of  0(|E|  log  |V|).  This  algorithm  appears  to  be  best  suited 
for  sparse  graphs. 

Kruskal's  method  may  be  viewed  as  an  application  of  the  greedy  algorithm. 
The  minimum  spanning  tree  is  constructed  by  examining  the  edges  in  order  of 
increasing  cost.  If  an  edge  forms  a  cycle  within  a  component  of  G^,  it  is 
discarded.  Otherwise  it  is  selected  and  yields  G^+i*  Here  also  different- 
sized  components  may  be  produced.  A  description  of  Kruskal's  algorithm  is 
given  in  Sedgewick  [18,  pp.  412-413],  along  with  its  (serial)  computational 
complexity  of  0(|El  log  |E|). 


III.  COMPUTATIONAL  RESULTS  WITH  SEQUENTIAL  ALGORITHMS 


Computer  codes  for  Boruvka's  algorithm,  Kruskal's  algorithm,  and  three 
versions  of  Prim's  algorithm  were  developed.  SPARSE  PRIM  maintains  the  edge 
data  in  both  forward  and  backward  star  format,  while  DENSE  PRIM  maintains  the 
edge  data  in  an  |v|  x  |v|  matrix.  HEAP  PRIM  maintains  the  edge  data  in  both 
forward  and  backward  star  format  and  makes  use  of  a  d-heap  as  described  in 
Tarjan  [19,  p.  77].  KRUSKAL  makes  use  of  a  partial  quick  sort  as  described  in 
[1,  8]  to  produce  the  least  cost  remaining  edge.  BORUVKA  is  a  straightforward 
implementation  of  the  algorithm  presented  in  [15]. 

Random  problems  were  generated  on  both  n  x  n  grid  graphs  and  on  completely 
random  graphs.  All  costs  were  uniformly  distributed  on  the  interval 
[0,  maxcost].  All  codes  are  written  in  FORTRAN  for  the  Balance  21000. 

The  computational  results  for  grid  graphs  are  presented  in  Table  1.  These 
graphs  are  very  sparse  and  BORUVKA  was  the  clear  winner.  The  computational 
results  for  random  graphs  may  be  found  in  Tables  2  and  3.  SPARSE  PRIM  was  the 
winner  for  problems  whose  density  was  at  least  40 %  with  HEAP  PRIM  running  a 
close  second.  For  problems  with  densities  of  20%  or  less,  HEAP  PRIM  was  the 
winner  with  RRUSKAL  running  a  close  second.  KRUSKAL  appeared  to  be  the  most 
robust  implementation,  working  fairly  well  on  all  problems  tested. 
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Table  1.  Comparison  of  Sequential  Algorithms  on  Grid  Graphs 
(cost  range  is  0  -  10,000) 


Grid  Size 
o  x  n 

Edges 

Graph 

Density 

DENSE 

PRIM 

SPARSE 

PRIM 

HEAP 

PRIM 

IRUSRAL 

boruvka 

15  x  15 

420 

1.72 

1.70 

.36 

.27 

.19 

.12 

18  x  18 

612 

1.22 

3.54 

.74 

.42 

.30 

.17 

20  x  20 

760 

1.02 

5.43 

1.10 

.54 

.39 

.21 

24  x  24 

1,104 

.72 

11.32 

2.19 

.82 

.63 

.30 

28  x  28 

1,512 

.52 

21.01 

4.09 

1.13 

.86 

.46 

30  x  30 

1,740 

.42 

27.82 

5.41 

1.37 

1.15 

.55 

J  Total  Tine  (secs.) 

|  70.82  |  13.89  |  4.55  |  3.52  | 

1  1.81  | 

|  Rank 

1  5  |  4  |  3  |  2  | 

1  1  1 
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Table  2.  Comparison  of  Sequential  Algorithms  on  High  Density  Random  Graphs 

(cost  range  is  0  -  10,000) 


j  Vertices 

Edges 

Graph 

Density 

DENSE 

PRIM 

SPARSE 

PRIM 

HEAP 

PRIM 

KRUSRAL 

BORUVKA 

200 

19,900 

100s 

1.39 

1.14 

1.44 

1.52 

3.01 

200 

15,920 

80S 

1.39 

.97 

1.22 

1.52 

1.96 

200 

11,940 

60S 

1.39 

.79 

.99 

.96 

1.47 

200 

7,960 

40S 

1.39 

.61 

.76 

.89 

1.02 

400 

79,800 

100S 

5.67 

4.55 

5.42 

4.45 

12.03 

400 

63,840 

80S 

5.69 

3.85 

4.53 

3.58 

10.28 

400 

47,880 

60S 

5.70 

3.13 

3.62 

2.82 

7.26 

400 

31,920 

40S 

5.71 

2.49 

2.68 

1.97 

4.85 

600 

179,700 

100% 

13.28 

10.39 

11.98 

12.38 

29.85 

600 

143,760 

00 

O 

^4 

13.66 

8.79 

9.99 

14.99 

23.72 

600 

107,820 

60% 

13.16 

7.15 

7.99 

10.63 

17.79 

600 

71,880 

40% 

13.02 

5.55 

5.67 

6.05 

11.80 

|Total  Time 

(secs. ) 

|  81.45 

49.41 

56.29 

61.76 

125.04  | 

|  Rank 

4  |  1 

2 

3 

5  1 
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Table  3.  Comparison  of  Sequential  Algorithms  on  Low  Density  Random  Graphs. 

(cost  range  is  0  -  10,000) 


Vertices 

Edges 

Graph 

Density 

DENSE 

PRIM 

SPARSE 

PRIM 

HEAP 

PRIM 

KRUSKAL 

BORUVIA 

200 

3,980 

20Z 

1.40 

.44 

.49 

.50 

.52 

200 

1,990 

10Z 

1.40 

.36 

.39 

.40 

.35 

200 

995 

5Z 

1.39 

.32 

.32 

.35 

.17 

400 

15,960 

20Z 

5.66 

1.75 

1.62 

1.47 

2.46 

400 

7,980 

10Z 

5.71 

1.40 

1.12 

1.53 

1.30 

400 

3,990 

5Z 

5.72 

1.21 

.86 

1.20 

.72 

600 

35,940 

20Z 

13.04 

3.94 

3.39 

3.99 

6.02 

600 

17,970 

10Z 

13.04 

3.05 

2.14 

2.89 

2.86 

600 

8,985 

5Z 

13.07 

2.73 

1.50 

2.12 

1.52 

(Total  Time 

(secs. ) 

60.43 

15.20 

11.83 

14.45 

15.92  | 

|Rank 

5 

1  3 

1 

2 

4  I 
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IV.  parallel  algorithms 


Para1 lei  versions  of  the  three  classical  algorithms  have  appeared  in  the 
litei-.ure  (see  [2,  5,  9,  10,  11,  12,  16,  17]},  however;  no  computation 
experience  has  been  reported.  The  overhead  required  for  coordinating  the  work 
of  multiple  processors  can  only  be  determined  by  actual  implementation  on  a 
parallel  processing  machine. 

A  parallel  version  of  Boruvka's  algorithm  was  developed  for  grid  graphs 
and  a  parallel  version  of  Prim's  algorithm  was  developed  for  high  density 
random  graphs.  Both  algorithms  use  modules  (subroutines)  which  may  be  executed 
in  parallel.  Suppose  there  are  p  processors  available  for  use.  The  parallel 
operations  are  initiated  by  the  main  program  using  statements  of  the  form: 

for  m  ■  1  to  p,  fork  module  z(m). 

The  main  nrogram  and  p-1  clones  will  each  execute  module  z_  in  parallel. 
Processing  does  not  continue  in  the  main  program  until  all  processors  complete 
module  z.  The  argument  "m"  allows  each  of  the  p  processors  to  process 
different  parts  of  the  data  or  follow  a  different  path.  We  assume  that  all 
data  in  the  main  program  is  shared  with  module  z.  If  module  z  has  local  non- 
shared  variables,  then  these  will  be  explicitly  stated  in  the  description  of 
the  module.  Multiple  processors  which  update  the  same  variable,  set,  or  list 
use  locks  to  insure  that  only  one  processor  has  access  to  a  given  item. 


4.1  Parallel  Boruvka  For  Grids 


Using  the  fork  and  lock  constructs  we  present  a  parallelization  of  Boruvka’s 
algorithm  for  grid  graphs.  The  most  expensive  component  of  Boruvka's 
sequential  algorithm  may  be  described  by  the  following  procedure: 
for  all  (u,v)  e  E 

let  i  and  j  denote  the  subtrees  containing  u  and  v,  respectively: 
if  i  f  j  then 

if  cost(u,v)  <  min(i)  then  min(i)  4-  cost(u,v) 
if  cost(u,v)  <  min(j)  then  min(j)  4-  cost(u,v) 
end  if 
end  for 

That  is,  all  the  edge  costs  must  be  examined  and  certain  subtree  data  are 
updated.  Our  parallelization  of  this  scan  relies  upon  a  partitioning  of  the 
grid  into  p  components  (one  for  each  processor).  A  three  processor  parti¬ 
tioning  of  a  7  x  7  grid  network  is  illustrated  in  Figure  1. 

The  above  edge  scan  is  performed  in  two  stages.  The  first  stage  performs 
a  parallel  scan  over  edges  both  of  whose  vertices  lie  within  the  same  partition. 
The  second  stage  performs  a  parallel  scan  over  edges  across  cut  sets.  If  each 
partition  consists  of  at  least  two  rows  of  the  grid,  then  all  subtree  data  up¬ 
dating  can  be  performed  independently  without  the  requirement  of  a  lock. 

The  second  part  of  Boruvka's  algorithm  is  to  merge  two  subtrees  by 
appending  a  new  edge.  The  merger  of  subtrees,  both  of  which  lie  in  the  same 
partition  can  also  be  executed  in  parallel. 

Using  this  data  partitioning  approach,  the  parallel  algorithm  may  be 
stated  as  follows: 
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Figure  1.  A  Three  Processor  Partitioning  of  a  7  x  7  Crld  Graph. 


PARALLEL  BORUVXA  FOR  GRIDS 


Input:  1.  An  n  x  n  grid  graph  G  »  [ V , E ]  with  V  ■  {v^,.,.,  v^}. 

2.  For  each  edge  (u,v)  e  E  a  cost  c(u,v). 

3.  The  number  of  processors,  p,  available  for  use. 

Output:  A  minimal  spanning  tree  [V,T]. 

Assumption:  G  is  connected  and  has  no  parallel  edges, 
begin 

T  4-  0,  r  4-  fn/pl.  I  4-  n  -  rp; 

If  r  <  2,  terminate. 

for  i  -  1  to  q,  Si  4-  { v± ) ; 

C  4—  (S^ , . . . ,  Sq } > 

Wx  4-  {v:  v  c  V  and  v  is  in  grid  rows  1  through  v  +  t); 
for  a  »  2  to  p, 

Wm  4-  {v:  v  c  V  and  v  is  in  grid  rows  (m-l)r  +  t  +  1  through  mr  +  {  ) ; 

for  m  -  1  to  p,  Xlffl  4-  {(u.v):  (u.v)  c  E,  u  c  WB,  and  v  e  Wm); 

for  m  ■  1  to  p  -  1, 

X2m  l(u-v)’-  (u.v)  E  Ewith  u  e  Wffi,  v  c  Wb+1  or  u  e  WB+1,  v  e  WBJ; 
for  i  -  1  to  q,  cpu(i)  4-  m,  where  e  Wm; 

(comment:  Sq  are  assigned  to  the  p  processes) 

create  p-1  clones 

(comment:  create  p-1  additional  processes  and  place  them  in  the  wait 
state) 

while  | C 1  f  1 

for  m  •  1  to  p,  fork  module  edgescan(l.m): 

(comment:  forks  are  executed  in  parallel  and  processing  does  not  continue 
in  the  main  program  until  all  processes  complete  edgescan) 

for  m  -  1  to  p-1,  fork  module  edgescanf 2.m) ; 

L  4-  0; 


r 


for  m  -  1  to  p,  fork  module  merge(m) ; 


for  all  (u,v)  c  L  do 

let  and  Sj  be  the  sets  containing  u  and  v,  respectively; 
if  |Si|  <  |Sj |  then 

4-  Sj,  C  4-  C\Sjj 

else 

Sj  4-  S4  U  Sjt  C  4-  C\S±: 
end  if 

T  4-  T  U(u,v); 
end  for 
end  while 
kill  the  clones 


module  edeescan(k.m) 
begin 

(comment:  k  *  1  implies  the  scan  is  within  partition  m, 

k  *  2  implies  the  scan  is  across  the  cut  set  separating  rartitions 
m  and  m  +  1) 

for  all  (u , v)  e  Xkr 

let  Si(  Sj  be  the  sets  containing  u  and  v,  respectively; 
if  i  #  j  then 

if  c(u,v)  <  min(i)  then  min(i)  4-  c(u,v),  shortest(i)  4-  (u,v); 
if  c(u,v)  <  min(j)  then  min(j)  4-  c(u,v),  shortest(j)  4-  (utv); 
end  if 

(comment:  shortest(i)  is  the  least  cost  edge  incident  on  S^) 
end  for 


"'i*: 


module  meree(m) 
begin 


for  all  vk  £  Vm  do 
(u,v)  4-  shortest(k) 

let  Si(  Sj  be  the  sets  containing  u  and  v,  respectively; 
if  i  +  j  then 

if  cpu(i)  ■  cpu(j)  then 
if  | St |  <  |Sj |  then 

si  4-  SiU  Sjt  C  4-  C\Sj ; 
else 

Sj  4-  U  Sj,  C  4*  C\S^; 
end  if 
lock  T 

T  4-  T  U((u,v)) 
unlock  T 
else 
lock  L 

L4LU  { C  u ,  v ) } 


unlock  L 


4.2  Parallel  Prim 


The  most  expensive  part  of  Prim's  sequential  algorithm  is  to  find  a 
minimum  entry  in  an  j  V  |  length  array.  This  search  can  be  allocated  over  p 
processors,  each  of  which  finds  a  candidate  minimum.  The  best  of  the  p  candidates 
becomes  the  global  minimum.  Under  the  assumption  that  parallel  edges  do  not 
exist,  there  is  also  a  scan  of  edges  over  the  forward  and  backward  star  of  a 
given  node  which  can  be  executed  in  parallel.  Data  partitioning  via  the  use  of 
independent  cut  sets  could  also  be  used  for  random  graphs  in  a  manner  similar 
to  that  described  in  Section  4.1.  That  has  not  been  done  in  this 
investigation. 

The  parallelization  of  Prim's  algorithm  may  be  stated  as  follows: 

PARALLEL  PRIM 

Input:  1.  A  graph  G  •  [ V , E ]  with  V  «  {vj .  vn). 

2.  For  each  edge  (u.v)  c  E,  a  cost  c(u,v). 

3.  The  number  of  processors,  p,  available  for  use. 

Output:  A  minimal  spanning  tree,  [V,T]. 

Assumption:  G  is  connected  and  has  no  parallel  edges, 
begin 

U  4-  {vj) ,  w  4-  vj,  T  4-  0  ; 
for  i  »  1  to  n,  d(i)  4-  ®  ; 
create  p-1  clones 

(comment:  create  p-1  additional  processes  and  place  them  in  a  wait 
state) 

F  4-  {(w,v)  c  E); 

partition  F  into  mutually  exclusive  sets  Fj,...,Fs,  s  <  p; 
for  m  •  1  to  s,  fork  module  f orwardscan(m) : 

B  4-  ( (u ,w)  c  E); 
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partition  B  into  mutually  exclusive  sets  Bj,...,Bt,  t  <  p; 
for  m  -  1  to  t,  fork  module  backwardscan(ro) : 
while  U  f  V  do 
globalmin  4-  ® ; 

for  a  ■  1  to  p,  fork  module  nodescan(m) ; 

(comment:  forks  are  executed  in  parallel  and  processing  does  not 

continue  in  the  main  program  until  all  processes  complete 
nodescan) 

T  4-  T  U  (e(ibest) ) ,  U  4-  U  U{w); 

F  4-  ((w,v)  e  E); 

partition  F  into  mutually  exclusive  sets  Fj .  Fs,  s  <  p; 

for  d  «  1  to  s,  fork  module  forvardscan(m) : 

B  4-  { ( u i w)  e  E); 

partition  B  into  mutually  exclusive  sets  Bj,...,  Bt,  t  p; 
for  m  •  1  to  t,  fork  module  backwardscan(m) ; 
end  while 
kill  the  clones 
end 
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module  nodescan(m) 
local  data:  min,  x 
begin 

min  4-  •> 

for  i  ■  m  to  n  step  p  do 

if  d(i)  <  min  then  min  4-  d(i),  x  4-  i; 
end  for 

lock  globalmin 

if  min  <  globalmin  then  globalmin  4-  min,  ibest  4-  x,  w  4-  v 
unlock  globalmin 


module  fowardscan(m) 
begin 

for  all  (u.v)  c  Fm  do; 

if  c(u,v)  <  d(v)  then  d(v)  4-  c(u,v),  e(v)  4-  (u,v); 
end  for 


module  backwardscanfm) 
begin 

for  all  (u,v)  e  Bm  dc; 

if  c(u,v)  <  d(u)  then  d(u)  4-  c(u,v)F  e(u)  4-  (u,v); 
end  for 


V.  COMPUTATIONAL  RESULTS  WITH  PARALLEL  ALGORITHMS 
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Both  algorithms  of  Section  IV  were  coded  in  FORTRAN  for  the  Balance  21000 
located  in  the  Center  for  Applied  Parallel  Processing  at  Southern  Methodist 
University.  The  Balance  21000  is  configured  with  twenty  NS32032  cpu'a,  32 
Mbytes  of  shared  memory ,  and  16K  user-accessible  hardware  locks.  Each  cpu  has 
8  Kbytes  of  local  RAM  and  8  Kbytes  of  cache.  The  Balance  21000  runs  the  DYNIX 
operating  system,  a  version  of  UNI!  4.2bsd.  DYNII  includes  routines  to  create, 
synchronize,  and  terminate  parallel  processes  from  C,  Pascal,  and  FORTRAN.  More 
details  about  the  Balance  21000  say  be  found  in  [13]. 

Table  4  gives  the  computational  results  with  Boruvka's  algorithm.  The 
times  are  wall  clock  times  and  are  the  average  for  three  runs.  The  first  row 
in  each  table  contains  the  time  for  the  sequential  version  of  B0RUVKA  and  all 
other  rows  contain  times  for  the  parallel  version.  The  sequential  version  is 
250  lines  of  code,  while  the  parallel  version  required  over  400  lines.  The 
speedup  for  a  row  is  calculated  by  dividing  the  best  sequential  time  by  the 
time  in  that  row. 

Initially,  the  parallel  code  creates  the  additional  processes  to  be  used 
and  requires  each  of  them  to  build  data  tables  which  give  the  location  in 
virtual  memory  of  all  shared  data.  Once  this  is  done,  the  processes  can  be 
used  repeatedly  with  little  system  overhead.  However,  this  initial  creation 
and  the  subsequent  killing  of  those  processes  at  termination  can  be  very 
expensive  for  this  type  of  problem.  The  first  column  of  times  includes  the 
creation  and  process  termination  time  while  the  second  does  not.  Hence,  if  a 
350  x  350  minimal  apanning  tree  was  to  be  obtained  one  time,  then  the  best 
speedup  is  2.6  using  seven  cpu’s.  If  however,  this  is  a  subprogram  of  a  larger 
system,  then  a  350  x  350  problem  can  yield  a  apeedup  of  four  on  six  processors 
and  a  speedup  of  five  on  ten. 


Table  4.  Parallel  Boruvka  on  350  z  350  Grid  Graph 
|V|  -  122,500  | E j  -  244,300 

(cost  range  is  0  -  100,000) 


|  cpu's  I 


PARALLEL  BORUVKA 
(includes  process  creation) 


time 


98.21 

112.57 

66.93 

50.26 

40.25 

39.00 

38.69 

37.70 
40.98 
42.49 
41.30 


speedup 


1.00 

.87 

1.47 

1.95 

2.44 

2.52 

2.54 

2.60 

2.40 

2.31 

2.38 


PARALLEL  BORUVKA 

(excludes  process  creation) 

time 

speedup 

98.21 

1.00 

103.86 

.95 

57.49 

1.71 

40.92 

2.40 

29.95 

3.28 

26.52 

3.70 

23.45 

4.19 

21.62 

4.54 

21.58 

4.55 

20.85 

4.71 

17.52 

5.61 

best  sequential  BORUVKA  code 
*  parallel  code  run  with  a  single  processor 
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Table  5.  Parallel  Prim  on  G  -  [V,E]  with  | V J  »  900  and  j £ j  -  404,550 

(cost  range  is  0  -  100,000) 


cpu's 


PARALLEL  PRIM 

(includes  process  creation) 

PARALLEL  PRIM 

(excludes  process  creation) 

time 

speedup  1 

time 

speedup 

24.88 

1.00 

24.88 

1.00 

27.09 

.92 

26.98 

.92 

23.35 

1.07 

15.12 

1.65 

22.63 

1.10 

10.84 

2.30 

25.31 

.98 

8.74 

2.85 

28.43 

00 

GO 

• 

7.39 

3.37 

31.54 

.79 

6.62 

3.76 

36.51 

.68 

6.03 

4.13 

41.08 

•61 

5.62 

4.43 

46.04 

.54 

5.30 

4.69 

50.54 

.49 

5.02 

4.96 

best  sequential  PARALLEL  PRIM  code 
*  parallel  code  run  with  a  single  processor 


Table  5  gives  the  computational  results  with  Prim's  algorithm.  No  speedup 
is  achievable  for  a  one-time  solution.  For  environments  in  which  the  minimum 
spanning  tree  problem  is  a  subproblem,  speedups  of  three  and  five  were  obtained 
on  five  and  ten  processors,  respectively. 


VI.  SUMMARY  AND  CONCLUSIONS 


Five  computer  codes  were  developed  to  solve  the  minimum  spanning  tree 
problem  on  a  sequential  machine.  These  codes  were  computationally  compared  on 
both  grid  graphs  and  random  graphs  whose  densities  varied  from  5 Z  to  1003.  The 
implementation  of  Boruvka's  algorithm  (see  [15,  p.  277])  was  the  best  for  grid 
graphs.  An  implementation  of  Prim's  algorithms  using  a  sparse  data  representa¬ 
tion  (see  [15,  p.  273])  was  best  for  high  density  random  graphs  while  an  imple¬ 
mentation  of  Prim's  algorithm  using  a  d-heap  (see  [19,  p.  77])  was  best  for 
lower  density  random  problems.  Kruskal's  algorithm  using  a  quicksort  is  the 
most  robust  of  all  the  implementations,  ranking  either  second  or  third  in  all 
computational  tests.  Both  Boruvka's  and  Prim's  algorithms  were  parallelized  by 
the  method  of  data  partitioning  (also  called  homogeneous  multitasking).  This 
involves  creating  multiple,  identical  processes  and  assigning  a  portion  of  the 
data  to  each  processor.  For  the  environment  in  which  a  minimal  spanning  tree 
problem  is  a  subproblem  within  a  larger  system,  speedups  of  five  on  ten 
processors  were  achieved  with  both  Prim's  and  Boruvka's  algorithms.  The 
overhead  for  parallel  processing  on  the  Balance  21000  negates  most  of  the 
benefits  of  parallel  processing  for  the  first  solution  of  the  minimal  spanning 
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of 

Rome  Air  Development  Center 

RAVC  plan- s  and  executes  research,  development,  test 
and  selected  acquisition  programs  In  support  of. 
Command,  Control,  Communications  and  Intelligence 
(C^I)  activities.  Technical  and  engineering 
support  within  areas  o 6  competence  Is  provided  to 
ESV  Program  O^lces  { PQs  )  and  other  ESV  elements 
to  perform  elective  acquisition  C3I  systems. 

The  areas  o  6  technical  competence  Include 
communications ,  command  and  control,  battle 
management.  In  formation  processing,  surveillance 
sensors,  Intelligence  data  collection  and  handling, 
solid  state  sciences,  electromagnetics ,  and 
propagation,  and  electronic,  maintainability , 
and  compatibility. 
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